INDEX
Explanations
low rank or status
The neuron activates on the word “commoner,” i.e. references to lower-class/common-status individuals.
New Auto-Interp
Negative Logits
unchecked
-0.07
walnut
-0.07
debounce
-0.07
erosis
-0.07
Berlin
-0.07
八
-0.07
-bedroom
-0.07
oplevel
-0.06
Pago
-0.06
सन
-0.06
POSITIVE LOGITS
getattr
0.06
underestimated
0.06
appropriate
0.06
/apps
0.06
сразу
0.06
configured
0.06
pic
0.06
食べ
0.06
assass
0.06
renewed
0.06
Activations Density 0.065%