INDEX
Explanations
The neuron is mainly looking for vulgar and inappropriate terms
slang terms related to male genitalia and insults
New Auto-Interp
Negative Logits
carbohydrate
-0.78
EV
-0.70
carbohydrates
-0.68
ocally
-0.66
EVA
-0.65
occ
-0.64
Ket
-0.63
heny
-0.63
sugars
-0.62
uries
-0.61
POSITIVE LOGITS
dick
1.22
prick
0.92
holes
0.90
asshole
0.87
abase
0.86
hole
0.84
amn
0.79
gers
0.79
ometer
0.78
yright
0.77
Activations Density 0.010%