INDEX
Explanations
references to 'people' and their opinions or behaviors
New Auto-Interp
Negative Logits
stadt
-0.15
erner
-0.15
nici
-0.14
venes
-0.13
uzu
-0.13
duk
-0.13
urga
-0.13
ocre
-0.13
ãģĿ
-0.13
udiantes
-0.13
POSITIVE LOGITS
rosso
0.18
ipo
0.16
might
0.15
say
0.15
forget
0.15
arus
0.15
Ðĭ
0.15
talk
0.14
432
0.14
raud
0.14
Activations Density 0.088%