INDEX
Explanations
previously seen
The neuron activates on phrases where the author quantifies having experienced (e.g. “seen”) something a certain number of times.
New Auto-Interp
Negative Logits
bro
-0.07
_u
-0.06
のだ
-0.06
灣
-0.06
ouflage
-0.06
粒
-0.06
Benny
-0.06
/drivers
-0.06
değildir
-0.06
_ord
-0.06
POSITIVE LOGITS
₁
0.07
подс
0.07
�
0.06
-ts
0.06
Lights
0.06
जल
0.06
",";↵
0.06
atts
0.06
Patton
0.06
itaire
0.06
Activations Density 0.046%