INDEX
Explanations
attracting attention
The neuron fires on erotically charged or provocative language—words that highlight sexualized, attention-grabbing descriptions.
New Auto-Interp
Negative Logits
Cait
-0.07
Hubb
-0.07
Eating
-0.07
كان
-0.07
Merr
-0.06
_rad
-0.06
iação
-0.06
measuring
-0.06
ると
-0.06
.Year
-0.06
POSITIVE LOGITS
_WRONG
0.06
grupo
0.06
Freeze
0.06
-establish
0.06
.gb
0.06
_fix
0.06
рий
0.06
0.06
indows
0.06
endor
0.06
Activations Density 0.027%