INDEX
Explanations
phrases related to political activism and resistance
New Auto-Interp
Negative Logits
hindsight
-0.86
VERTISEMENT
-0.82
Presumably
-0.75
Interest
-0.75
avorable
-0.73
laughs
-0.72
reviewer
-0.69
Laughs
-0.67
Random
-0.67
debugging
-0.66
POSITIVE LOGITS
unite
1.16
salute
1.16
pledge
1.13
vow
1.03
united
1.02
Together
1.00
patri
0.98
belong
0.95
owe
0.93
liberate
0.93
Activations Density 0.367%