INDEX
Explanations
This neuron fires on requests for “dirty talk,” specifically spotting the word “dirty” when the user asks the model to talk dirty.
New Auto-Interp
Negative Logits
/scripts
-0.06
Meadows
-0.06
.generic
-0.06
mechanic
-0.06
:::/
-0.06
nach
-0.06
.maximum
-0.06
Wheeler
-0.06
zengin
-0.06
了
-0.06
POSITIVE LOGITS
Lease
0.07
важа
0.06
Dro
0.06
SNAP
0.06
PROJECT
0.06
ambil
0.06
offsets
0.06
Case
0.06
aşam
0.06
FLOAT
0.06
Activations Density 0.027%