INDEX
Explanations
phrases related to politics or society, including mentions of political figures, social issues, and political actions
expressions related to social and political changes
New Auto-Interp
Negative Logits
beforehand
-0.68
oven
-0.65
initially
-0.63
mental
-0.63
cius
-0.62
Conc
-0.59
una
-0.58
prior
-0.57
Interstitial
-0.56
bol
-0.56
POSITIVE LOGITS
here
0.83
nir
0.74
anew
0.74
finally
0.73
resur
0.70
adays
0.70
aukee
0.69
CLUS
0.68
IRE
0.67
again
0.65
Activations Density 0.452%