INDEX
Explanations
words related to political events and conflicts
references to political events or government actions
New Auto-Interp
Negative Logits
Ń·
-0.80
erity
-0.74
Down
-0.72
Īè
-0.71
=-=-=-=-=-=-=-=-
-0.66
ials
-0.64
orthern
-0.64
ãĤ®
-0.63
Gladiator
-0.63
çͰ
-0.62
POSITIVE LOGITS
Cue
0.69
AE
0.68
stellar
0.65
cum
0.64
gag
0.62
dom
0.62
hack
0.62
Boo
0.61
cade
0.61
Cf
0.60
Activations Density 0.000%