INDEX
Explanations
phrases that indicate ownership or association
New Auto-Interp
Negative Logits
ucht
-0.17
vale
-0.15
ай
-0.15
Sab
-0.14
attempt
-0.14
Holmes
-0.14
iaux
-0.14
217
-0.14
abinet
-0.14
Sea
-0.14
POSITIVE LOGITS
yre
0.15
nop
0.15
SHOT
0.14
lops
0.14
eld
0.14
lude
0.14
Cah
0.13
Naked
0.13
öh
0.13
elda
0.13
Activations Density 0.061%