INDEX
Explanations
The neuron selectively activates on the word “perverted,” i.e. markers of sexual perversion.
New Auto-Interp
Negative Logits
:aload
-0.06
fullName
-0.06
BoxFit
-0.06
Stars
-0.05
"'.$
-0.05
//'
-0.05
bru
-0.05
textTheme
-0.05
.StatusCode
-0.05
Matcher
-0.05
POSITIVE LOGITS
Prov
0.08
eur
0.08
etin
0.07
perv
0.07
delivering
0.07
avity
0.07
convenience
0.07
within
0.07
BILE
0.07
spécial
0.07
Activations Density 0.005%