INDEX
Explanations
The neuron fires on explicit erotic or sexual content.
New Auto-Interp
Negative Logits
_LEVEL
-0.07
Earn
-0.07
omen
-0.07
GET
-0.07
essay
-0.07
LIN
-0.06
eventual
-0.06
Capital
-0.06
Victim
-0.06
.RES
-0.06
POSITIVE LOGITS
าหล
0.07
sûr
0.07
thì
0.07
ちょ
0.06
arasındaki
0.06
ifications
0.06
sound
0.06
นาม
0.06
roph
0.06
pickle
0.06
Activations Density 0.008%