INDEX
Explanations
phrases related to public figures and their actions, particularly in a political context
phrases related to claims and statements made by individuals, particularly in a political context
New Auto-Interp
Negative Logits
agar
-0.79
td
-0.70
allery
-0.69
udic
-0.68
ason
-0.67
arist
-0.66
neau
-0.66
ept
-0.66
central
-0.65
Brill
-0.65
POSITIVE LOGITS
himself
0.85
surrog
0.84
lewd
0.80
retweet
0.77
tweeting
0.75
Tonight
0.74
gyn
0.73
onstage
0.73
sarcast
0.72
insults
0.71
Activations Density 0.478%