INDEX
Explanations
the word "for" in various contexts
New Auto-Interp
Negative Logits
antics
-0.16
á»ī
-0.14
aira
-0.14
ala
-0.14
ila
-0.14
apol
-0.14
Hud
-0.14
man
-0.14
stroy
-0.14
ick
-0.14
POSITIVE LOGITS
ستر
0.18
ooter
0.15
bung
0.15
GOODMAN
0.15
werp
0.15
ayar
0.14
ätz
0.14
Goodman
0.14
ë¬¸ìłľ
0.14
.TestTools
0.14
Activations Density 0.019%