INDEX
Explanations
phrases related to political correctness and historical narratives regarding race
New Auto-Interp
Negative Logits
esquer
-0.66
ScopeManager
-0.59
eadilan
-0.57
#+#
-0.57
traceback
-0.56
tendre
-0.56
czaj
-0.55
ритори
-0.53
Amal
-0.52
roek
-0.52
POSITIVE LOGITS
conservative
1.18
conservatives
1.06
Conservative
1.01
Conservative
1.00
conservative
1.00
GOP
0.94
Conservatives
0.88
Republican
0.88
conservatism
0.86
reactionary
0.78
Activations Density 0.731%