INDEX
Explanations
words or phrases related to opposition or conflict
references to anti-related sentiments or movements
New Auto-Interp
Negative Logits
ulhu
-0.93
swick
-0.82
çīĪ
-0.74
ADRA
-0.72
ynski
-0.71
WD
-0.70
ĸļ
-0.67
ħĭ
-0.66
Cohn
-0.65
Sponsor
-0.65
POSITIVE LOGITS
government
1.01
establishment
0.99
social
0.98
violence
0.95
Semitic
0.93
commun
0.92
American
0.91
democratic
0.90
capitalist
0.89
aligned
0.89
Activations Density 0.048%