INDEX
Explanations
references to specific political figures or figures related to politics
mentions of political figures, particularly during discussions of their actions and controversies
New Auto-Interp
Negative Logits
actionDate
-0.74
lehem
-0.70
DAQ
-0.69
Mandatory
-0.65
sembly
-0.62
Mehran
-0.61
!/
-0.61
Artemis
-0.61
ammy
-0.61
Redd
-0.60
POSITIVE LOGITS
himself
1.13
's
0.95
aides
0.93
Himself
0.88
swore
0.86
spends
0.85
prefers
0.84
complains
0.82
believes
0.80
knows
0.79
Activations Density 0.225%