INDEX
Explanations
phrases indicating a change or discontinuation of previous actions or habits
repeated expressions of cessation or changes in behavior
New Auto-Interp
Negative Logits
ortment
-0.61
Proceed
-0.60
utmost
-0.59
gged
-0.57
urat
-0.57
romy
-0.56
maximum
-0.54
bounty
-0.53
gist
-0.53
Difference
-0.53
POSITIVE LOGITS
anymore
0.91
than
0.91
:(
0.84
nces
0.83
;)
0.76
adays
0.73
unless
0.73
than
0.73
:)
0.72
!
0.72
Activations Density 0.030%