INDEX
Explanations
phrases indicating contrast/consequences
em dashes and phrases that connect ideas or indicate ongoing thoughts
New Auto-Interp
Negative Logits
obser
-0.84
lifes
-0.77
sleeper
-0.77
subsequ
-0.73
cons
-0.73
omorphic
-0.72
ccording
-0.72
itton
-0.71
ĵĺ
-0.71
stabil
-0.70
POSITIVE LOGITS
––
0.87
---
0.85
_-
0.84
[[
0.81
————
0.81
WATCHED
0.81
Cosponsors
0.81
âĸº
0.80
âĢķ
0.80
SEE
0.79
Activations Density 0.031%