INDEX
Explanations
phrases indicating being in close proximity or presence of others
New Auto-Interp
Negative Logits
uger
-0.17
ä»Ķ
-0.16
zilla
-0.15
_FP
-0.15
icks
-0.15
McCart
-0.15
Hoover
-0.15
jav
-0.14
ег
-0.14
834
-0.14
POSITIVE LOGITS
rani
0.17
Hear
0.15
kä
0.14
prof
0.14
iore
0.14
кÑĥл
0.14
-console
0.13
/on
0.13
nero
0.13
ohl
0.13
Activations Density 0.001%