INDEX
Explanations
This neuron flags explicit erotic or sexual content, particularly words referring to nudity or semi-nudity.
New Auto-Interp
Negative Logits
حداقل
-0.08
İŞ
-0.07
bic
-0.06
']="
-0.06
میزان
-0.06
/ws
-0.06
켜
-0.06
言って
-0.06
+y
-0.06
ِل
-0.06
POSITIVE LOGITS
nude
0.08
Nude
0.08
Norse
0.07
distortion
0.07
nudity
0.07
German
0.06
Holland
0.06
defense
0.06
expose
0.06
StyleSheet
0.06
Activations Density 0.004%