INDEX
Explanations
phrases related to taking action and making decisions
repeated references to the word "we."
New Auto-Interp
Negative Logits
REDACTED
-0.72
Publication
-0.71
gratification
-0.67
odor
-0.66
ions
-0.62
Tai
-0.59
Nay
-0.58
more
-0.58
Eleven
-0.58
Rowe
-0.57
POSITIVE LOGITS
've
1.32
're
1.27
'll
1.09
asel
1.07
ourselves
1.07
'd
1.05
athered
1.03
IRD
1.02
ibo
0.95
lder
0.94
Activations Density 0.258%