INDEX
Explanations
words related to elections, political victories, and political events
New Auto-Interp
Negative Logits
advising
-0.63
senal
-0.62
bian
-0.60
rouch
-0.59
OSE
-0.59
rete
-0.58
forth
-0.57
filler
-0.57
appa
-0.57
Factor
-0.57
POSITIVE LOGITS
now
0.98
nings
0.96
trophies
0.83
hearts
0.81
ipeg
0.80
victories
0.78
cest
0.78
ced
0.78
't
0.77
throp
0.76
Activations Density 4.313%