INDEX
Explanations
words associated with geographical and cultural references
New Auto-Interp
Negative Logits
emie
-0.17
adies
-0.16
apor
-0.16
oblin
-0.16
ery
-0.16
bane
-0.16
omm
-0.16
heck
-0.16
ég
-0.15
apo
-0.15
POSITIVE LOGITS
ux
0.38
ix
0.31
UX
0.28
uve
0.27
uv
0.26
iller
0.25
uil
0.24
indre
0.24
ignant
0.24
urs
0.23
Activations Density 0.042%