INDEX
Explanations
verbs related to taking action or following instructions
the phrase "you have" in various contexts
New Auto-Interp
Negative Logits
usp
-0.70
peed
-0.67
Discuss
-0.65
inance
-0.65
cession
-0.64
Else
-0.63
hyde
-0.63
icy
-0.59
wi
-0.59
ima
-0.58
POSITIVE LOGITS
gotta
1.03
yourself
0.93
probably
0.93
heard
0.90
guessed
0.88
choices
0.82
undoubtedly
0.82
doubtless
0.82
seen
0.81
yourselves
0.80
Activations Density 0.081%