INDEX
Explanations
prepositions and expressions of direction or movement
New Auto-Interp
Negative Logits
somewhere
-0.16
none
-0.15
humans
-0.15
none
-0.15
None
-0.15
sez
-0.14
oir
-0.14
.none
-0.14
ά
-0.14
quet
-0.13
POSITIVE LOGITS
everything
0.71
everything
0.60
Everything
0.56
Everything
0.54
ä¸ĢåĪĩ
0.46
every
0.45
tudo
0.45
EVERY
0.40
everyone
0.40
alles
0.39
Activations Density 0.042%