INDEX
Explanations
references to various political figures and their actions in the context of politics and criticism
references to political figures and their actions
New Auto-Interp
Negative Logits
tml
-0.72
mble
-0.70
lehem
-0.64
Mehran
-0.61
aughtered
-0.60
ãĥĩãĤ£
-0.58
ILCS
-0.58
Berry
-0.57
ranged
-0.56
emed
-0.56
POSITIVE LOGITS
's
1.01
ÃŃs
0.86
inaction
0.82
interfering
0.81
wanting
0.79
behaving
0.78
meddling
0.76
shortcomings
0.76
policies
0.74
losing
0.74
Activations Density 0.397%