INDEX
Explanations
specific keywords indicating a temporal sequence or context transition
the word "Before" indicating prior events or contexts
New Auto-Interp
Negative Logits
aren
-0.83
utter
-0.79
pez
-0.68
wire
-0.66
ILY
-0.65
è¦ļéĨĴ
-0.65
hyde
-0.65
hack
-0.63
erry
-0.63
amount
-0.62
POSITIVE LOGITS
cluding
0.75
rely
0.72
pping
0.72
irement
0.71
realizing
0.71
Stats
0.70
noon
0.70
anyone
0.66
concluding
0.64
hand
0.64
Activations Density 0.035%