INDEX
Explanations
references to individuals or possessive pronouns
New Auto-Interp
Negative Logits
оби
-0.17
á»Ļi
-0.15
shal
-0.15
SKIP
-0.15
iband
-0.15
voy
-0.14
pike
-0.14
ноÑģÑı
-0.14
ipel
-0.14
ãĤ¡
-0.14
POSITIVE LOGITS
èĬĿ
0.15
minated
0.15
ë
0.15
station
0.14
azer
0.14
IST
0.14
chia
0.14
estre
0.13
azÄĥ
0.13
é¹
0.13
Activations Density 0.007%