INDEX
Explanations
references to specific geographical locations or countries
New Auto-Interp
Negative Logits
olt
-0.16
ul
-0.16
individual
-0.14
à¥ģà¤Ĩ
-0.14
fu
-0.14
gro
-0.14
agy
-0.14
Ñĥл
-0.14
raft
-0.13
sd
-0.13
POSITIVE LOGITS
itself
0.20
ifen
0.19
herself
0.18
Himself
0.17
ilies
0.17
ãĥ¼ãĥĵ
0.16
himself
0.15
Wick
0.15
lamaz
0.15
ÄĮesk
0.15
Activations Density 0.059%