INDEX
Explanations
the occurrence of specific phrases or sentences
common phrases and quotations
New Auto-Interp
Negative Logits
ÄŁ
-0.75
fman
-0.74
llah
-0.69
appointments
-0.69
contrace
-0.67
hemor
-0.67
oÄŁ
-0.66
gur
-0.63
Thro
-0.63
ockets
-0.63
POSITIVE LOGITS
phrase
1.15
ology
1.04
uttered
1.03
phrase
0.92
phrases
0.89
coined
0.89
terday
0.84
stress
0.84
slogan
0.80
stick
0.77
Activations Density 0.060%