INDEX
Explanations
phrases related to actions or intentions involving going, doing, or saying
expressions of intention or desire related to actions and activities
New Auto-Interp
Negative Logits
requires
-0.75
Increasing
-0.69
marked
-0.68
unsurprisingly
-0.67
flagged
-0.66
surprisingly
-0.65
millenn
-0.65
Advantage
-0.65
strikingly
-0.64
noteworthy
-0.63
POSITIVE LOGITS
stay
1.32
get
1.18
participate
1.18
finish
1.13
talk
1.12
listen
1.10
survive
1.08
hear
1.07
marry
1.06
communicate
1.06
Activations Density 0.414%