INDEX
Explanations
names of countries or regions
New Auto-Interp
Negative Logits
gerald
-0.82
igue
-0.80
MENTS
-0.80
lli
-0.80
llo
-0.76
eus
-0.74
MENT
-0.74
igor
-0.73
ashtra
-0.71
lla
-0.70
POSITIVE LOGITS
ì
1.02
Jong
1.00
ë
1.00
ongyang
0.93
Kardashian
0.92
orea
0.86
ë
0.86
stress
0.85
Korea
0.84
peninsula
0.83
Activations Density 0.858%