INDEX
Explanations
phrases related to political and financial manipulation
New Auto-Interp
Negative Logits
unan
-0.17
democrat
-0.16
éĿ©åij½
-0.15
ritel
-0.14
conte
-0.14
itou
-0.14
totalitarian
-0.14
ocracy
-0.14
addin
-0.14
elman
-0.14
POSITIVE LOGITS
moderate
0.47
moder
0.42
Moderate
0.38
moderation
0.37
cent
0.35
Moder
0.33
Moder
0.31
centr
0.28
left
0.28
mod
0.26
Activations Density 0.271%