INDEX
Explanations
passages discussing political figures and their actions
New Auto-Interp
Negative Logits
stamped
-0.93
extinguished
-0.85
punch
-0.81
ration
-0.71
stamp
-0.68
fade
-0.68
coughing
-0.68
manoeuv
-0.68
brawl
-0.66
taunt
-0.66
POSITIVE LOGITS
Previously
1.26
Currently
1.20
Originally
1.08
She
1.04
He
1.03
Serving
1.02
His
1.01
Recently
0.98
Born
0.97
Prior
0.94
Activations Density 0.134%