INDEX
Explanations
This neuron activates on the question word “how.”
New Auto-Interp
Negative Logits
,False
-0.07
sentence
-0.07
-driver
-0.07
sees
-0.06
descricao
-0.06
[]{↵-0.06
notation
-0.06
_was
-0.06
.Registry
-0.06
десят
-0.06
POSITIVE LOGITS
how
0.09
How
0.08
How
0.07
assignable
0.07
أب
0.06
propor
0.06
.esp
0.06
возмож
0.06
Ease
0.06
امکان
0.06
Activations Density 0.042%