INDEX
Explanations
underwear
The neuron activates on mentions of underwear or related undergarment terms.
New Auto-Interp
Negative Logits
kle
-0.07
_socket
-0.07
損
-0.07
jes
-0.07
Phase
-0.06
Station
-0.06
projection
-0.06
cos
-0.06
sales
-0.06
jon
-0.06
POSITIVE LOGITS
underwear
0.10
erect
0.07
↵
0.07
avir
0.06
dara
0.06
WithContext
0.06
ander
0.06
arose
0.06
arise
0.06
एक
0.06
Activations Density 0.006%