INDEX
Explanations
This neuron primarily detects occurrences of the word “sex.”
New Auto-Interp
Negative Logits
oub
-0.07
Soup
-0.06
Address
-0.06
절
-0.06
_m
-0.06
-back
-0.06
ेल
-0.06
-sm
-0.06
NETWORK
-0.06
shape
-0.06
POSITIVE LOGITS
Cherokee
0.07
tienes
0.07
hybrids
0.07
(register
0.07
бо
0.06
cevap
0.06
Atatürk
0.06
fuck
0.06
demonstr
0.06
alleging
0.06
Activations Density 0.017%