INDEX
Explanations
pronouns followed by a verb indicating an action
the pronoun "We."
New Auto-Interp
Negative Logits
LSD
-0.66
quo
-0.65
bud
-0.64
INGTON
-0.63
REDACTED
-0.63
onies
-0.60
Publication
-0.60
Chad
-0.58
steroids
-0.57
cum
-0.57
POSITIVE LOGITS
asel
1.05
IRD
0.98
're
0.98
ighed
0.95
bsite
0.95
selves
0.94
've
0.90
chwitz
0.87
ldon
0.87
'll
0.86
Activations Density 0.107%