INDEX
Explanations
terms related to intense anger or frustration
the word "rage"
instances of the word "rage."
New Auto-Interp
Negative Logits
icut
-0.91
missions
-0.76
arent
-0.76
herty
-0.74
lder
-0.74
uchin
-0.71
iden
-0.71
Liberties
-0.71
poon
-0.71
metics
-0.70
POSITIVE LOGITS
rage
1.12
quit
0.97
raging
0.92
fury
0.92
furnace
0.84
raged
0.84
naire
0.80
indignation
0.79
Rage
0.74
anger
0.70
Activations Density 0.008%