INDEX
Explanations
mentions of making progress or taking action
phrases indicating future actions or decisions
New Auto-Interp
Negative Logits
herself
-0.66
Downloadha
-0.65
accompanies
-0.64
denotes
-0.61
uttered
-0.61
assis
-0.61
Sample
-0.61
FUL
-0.60
âĺħ
-0.60
blance
-0.59
POSITIVE LOGITS
ourselves
1.46
gonna
0.87
[
0.78
everybody
0.78
mble
0.77
selves
0.76
our
0.72
guys
0.71
together
0.71
gotta
0.70
Activations Density 0.483%