INDEX
Explanations
coefficient
This neuron activates on mentions of numeric “coefficient” (and related forms) in text.
New Auto-Interp
Negative Logits
165
-0.08
Actualizar
-0.07
walk
-0.07
323
-0.07
PARK
-0.07
325
-0.07
armed
-0.07
park
-0.07
Mary
-0.06
uptime
-0.06
POSITIVE LOGITS
coefficient
0.12
coefficients
0.12
Coefficient
0.08
efficient
0.08
Cohen
0.08
coeff
0.08
Coeff
0.08
Wheel
0.07
conquest
0.07
Coleman
0.07
Activations Density 0.005%