INDEX
Explanations
phrases related to political figures and their actions or statements
references to political criticism and controversies involving specific politicians
New Auto-Interp
Negative Logits
ILCS
-0.82
prest
-0.67
Cf
-0.64
---------
-0.63
pires
-0.58
~/
-0.58
PDATED
-0.57
=~
-0.56
dots
-0.56
Somewhere
-0.56
POSITIVE LOGITS
harshly
0.94
accusing
0.88
merciless
0.84
hypocrisy
0.84
for
0.81
hypoc
0.80
relentlessly
0.80
alleging
0.79
repeatedly
0.76
tactics
0.76
Activations Density 0.390%