INDEX
Explanations
terms related to national identity or national themes
New Auto-Interp
Negative Logits
/he
-0.17
anuts
-0.16
nice
-0.16
nice
-0.15
Nice
-0.15
Nice
-0.15
orgh
-0.15
äºĭæĥħ
-0.14
inand
-0.14
elda
-0.14
POSITIVE LOGITS
/local
0.21
/reg
0.19
/global
0.18
LEGRO
0.17
/world
0.17
ized
0.17
izing
0.16
wide
0.16
nap
0.15
istic
0.15
Activations Density 0.050%