INDEX
Explanations
references to the United States and its political context
New Auto-Interp
Negative Logits
gressor
-0.15
esi
-0.15
ouch
-0.15
INDER
-0.14
apo
-0.14
ket
-0.13
nominal
-0.13
Heller
-0.13
Trap
-0.13
Sed
-0.13
POSITIVE LOGITS
eya
0.17
enson
0.17
aina
0.15
Fairfax
0.14
opies
0.14
imity
0.14
aylor
0.14
Duffy
0.14
θα
0.13
antage
0.13
Activations Density 0.232%