INDEX
Explanations
words related to the political situation in some region
references to specific people or entities in news contexts
New Auto-Interp
Negative Logits
gling
-0.82
glim
-0.79
pins
-0.73
Ü
-0.72
ateurs
-0.71
livious
-0.68
ateur
-0.68
ensical
-0.67
anted
-0.67
manship
-0.66
POSITIVE LOGITS
eta
1.22
emia
0.87
quez
0.87
ichi
0.81
fter
0.78
fters
0.78
zza
0.78
ilon
0.77
Rossi
0.76
Centauri
0.76
Activations Density 0.008%