INDEX
Explanations
The neuron strongly responds to the start-of-text token, i.e., the beginning of a sequence.
New Auto-Interp
Negative Logits
Gone
-0.08
須
-0.08
مرح
-0.08
Wanted
-0.08
bumps
-0.08
회
-0.08
cornerstone
-0.08
/rem
-0.08
births
-0.08
�
-0.07
POSITIVE LOGITS
vergelijking
0.08
portátil
0.08
出去
0.08
envelop
0.07
pad
0.07
groot
0.07
fb
0.07
Ub
0.07
fw
0.07
grip
0.07
Activations Density 0.221%