INDEX
Explanations
names or references to specific locations
references to specific locations and associated events related to political or social issues
New Auto-Interp
Negative Logits
ertodd
-0.84
orem
-0.77
redit
-0.68
ITH
-0.66
luster
-0.66
Ĥª
-0.66
rint
-0.64
swing
-0.63
matic
-0.63
Wyoming
-0.61
POSITIVE LOGITS
Mubarak
0.84
asser
0.82
ée
0.82
mone
0.81
============
0.79
xual
0.78
éĹĺ
0.76
anca
0.73
Ara
0.73
ousse
0.72
Activations Density 0.042%