INDEX
Explanations
instances of the word "for."
New Auto-Interp
Negative Logits
zew
-0.17
wit
-0.15
že
-0.14
agos
-0.14
ÑĮÑı
-0.13
ember
-0.13
림
-0.13
anke
-0.13
unders
-0.13
edy
-0.13
POSITIVE LOGITS
usat
0.17
ksen
0.15
transition
0.14
_mgmt
0.14
è¯Ŀ
0.14
igue
0.14
kovi
0.13
ght
0.13
лиж
0.13
ackers
0.13
Activations Density 0.154%