INDEX
Explanations
This neuron activates on tokens that mark the start of sentences (e.g. initial words or markers at the beginning of each sentence).
New Auto-Interp
Negative Logits
wor
-0.07
иболее
-0.07
dishonest
-0.07
Docs
-0.07
Reminder
-0.07
ificação
-0.07
Fakat
-0.06
“[
-0.06
_cash
-0.06
�
-0.06
POSITIVE LOGITS
Faces
0.06
Holds
0.06
scoop
0.06
(equal
0.06
zte
0.06
resign
0.06
uplicates
0.06
hứ
0.06
NA
0.06
rhs
0.06
Activations Density 0.064%