INDEX
Explanations
personal pronouns and references to individual identity or possession
New Auto-Interp
Negative Logits
lj
-0.15
oko
-0.15
hip
-0.14
hip
-0.14
Hip
-0.14
piler
-0.14
ëıĦ
-0.14
arak
-0.14
diplom
-0.14
Ney
-0.14
POSITIVE LOGITS
iosa
0.17
anmar
0.16
irut
0.15
defenses
0.15
chet
0.14
untas
0.14
ürn
0.14
ãĤ¨ãĥ«
0.14
ocard
0.14
etime
0.14
Activations Density 0.411%