INDEX
Explanations
The neuron detects mentions of “first time” (or similar phrasing) that signal a repeated or prior occurrence.
New Auto-Interp
Negative Logits
nostra
-0.07
sanctuary
-0.07
excluding
-0.07
혹
-0.07
_DAY
-0.06
rador
-0.06
Saga
-0.06
논
-0.06
NAS
-0.06
236
-0.06
POSITIVE LOGITS
openid
0.06
multit
0.06
(=)
0.06
↵
0.06
_almost
0.06
weit
0.06
concentrates
0.06
)":
0.06
Stap
0.06
correlated
0.06
Activations Density 0.009%