INDEX
Explanations
This neuron fires on the first content word at the start of a new sentence or segment.
New Auto-Interp
Negative Logits
уры
-0.07
slu
-0.07
impost
-0.07
穴
-0.06
арам
-0.06
jr
-0.06
Neville
-0.06
nelle
-0.06
او
-0.06
wastewater
-0.06
POSITIVE LOGITS
Towards
0.08
лючается
0.07
\brief
0.07
TOR
0.06
(^
0.06
<=$
0.06
ustrial
0.06
Above
0.06
농
0.06
فصل
0.06
Activations Density 0.079%