INDEX
Explanations
references to geographic locations and their associated societal or cultural contexts
New Auto-Interp
Negative Logits
ind
-0.16
ie
-0.16
inet
-0.15
cum
-0.15
uto
-0.14
Ging
-0.14
188
-0.14
typings
-0.14
cha
-0.14
pag
-0.14
POSITIVE LOGITS
buz
0.15
Ups
0.15
yš
0.15
sik
0.14
ÐĴики
0.14
kek
0.14
bij
0.14
rico
0.14
olland
0.13
@js
0.13
Activations Density 0.208%