INDEX
Explanations
quantities
This neuron activates on the word “multiple” (i.e. occurrences of “multiple” indicating more than one).
New Auto-Interp
Negative Logits
eater
-0.07
-Mart
-0.07
říž
-0.07
-business
-0.07
intimidation
-0.06
Shel
-0.06
decoder
-0.06
ंश
-0.06
Succ
-0.06
tire
-0.06
POSITIVE LOGITS
']);
0.07
metaph
0.06
тех
0.06
しない
0.06
$con
0.06
?</
0.06
void
0.06
ajout
0.06
нив
0.06
코
0.06
Activations Density 0.066%