INDEX
Explanations
nouns related to political manipulation or extremism
terms related to political and social manipulation
New Auto-Interp
Negative Logits
staking
-0.74
flix
-0.73
sheet
-0.72
dry
-0.67
EVA
-0.64
wards
-0.63
Kaw
-0.63
scope
-0.63
WWF
-0.62
LAST
-0.62
POSITIVE LOGITS
agog
1.49
agogue
1.27
ues
1.05
acles
0.97
urations
0.96
allery
0.95
ically
0.91
atory
0.91
ococ
0.89
owitz
0.89
Activations Density 0.022%