INDEX
Explanations
actions and emotions
The main thing this neuron does is detect explicit sexual or pornographic terms.
New Auto-Interp
Negative Logits
agger
-0.07
URIComponent
-0.07
Executors
-0.07
issa
-0.07
chatting
-0.07
LineColor
-0.07
camp
-0.06
millet
-0.06
/r
-0.06
an
-0.06
POSITIVE LOGITS
Epic
0.06
抽
0.06
gment
0.06
[_
0.06
ดร
0.06
оюз
0.06
prefix
0.06
listOf
0.06
flipped
0.06
convenience
0.06
Activations Density 0.005%