INDEX
Explanations
punctuation and specific phrases or segments that indicate decision-making or significant moments of change
New Auto-Interp
Negative Logits
represented
-0.19
meant
-0.18
represented
-0.16
portrayed
-0.16
depicted
-0.15
preceded
-0.15
seen
-0.14
presented
-0.14
supported
-0.14
Seen
-0.14
POSITIVE LOGITS
ate
0.29
took
0.26
went
0.26
drank
0.26
threw
0.25
got
0.25
blew
0.24
went
0.24
flew
0.24
withdrew
0.24
Activations Density 0.201%