INDEX
Explanations
phrases mentioning specific political figures
references to specific names or entities, particularly those that are significant figures or organizations
New Auto-Interp
Negative Logits
UAL
-0.78
ewski
-0.73
Gork
-0.69
ateurs
-0.68
nings
-0.68
sing
-0.68
rament
-0.67
noon
-0.66
Required
-0.64
IBLE
-0.64
POSITIVE LOGITS
Haram
1.04
oro
1.00
vernment
0.97
NetMessage
0.90
oko
0.88
lé
0.87
annis
0.83
issan
0.76
wana
0.75
heastern
0.74
Activations Density 0.017%