INDEX
Explanations
sentences starting with "After" or similar patterns indicating a sequence of events
statements about attempts and outcomes in a process or decision-making scenario
New Auto-Interp
Negative Logits
Previous
-0.68
Enhance
-0.67
Attack
-0.65
differently
-0.62
Secondly
-0.61
Previously
-0.60
Previous
-0.60
PLIED
-0.59
include
-0.59
Also
-0.58
POSITIVE LOGITS
finally
1.62
relent
1.32
settled
1.11
prevailed
1.09
reluctantly
1.08
capit
1.03
decided
1.02
eventually
1.02
icably
0.94
succumbed
0.94
Activations Density 0.421%