INDEX
Explanations
The neuron fires on erotic descriptive language that vividly portrays bodies or sexual attributes.
New Auto-Interp
Negative Logits
Microsoft
-0.06
lings
-0.06
jdk
-0.06
Knot
-0.06
ih
-0.06
haft
-0.06
.directory
-0.06
ahrung
-0.05
GridView
-0.05
),
-0.05
POSITIVE LOGITS
koliv
0.07
pop
0.07
ριστ
0.07
併
0.07
.sex
0.07
šem
0.06
ifton
0.06
سی
0.06
τας
0.06
listeners
0.06
Activations Density 0.008%