INDEX
Explanations
politeness
The neuron detects words signaling a polite or courteous tone (e.g., “politely”).
New Auto-Interp
Negative Logits
fats
-0.07
BI
-0.07
ninh
-0.06
mins
-0.06
masturbation
-0.06
اك
-0.06
_send
-0.06
withStyles
-0.06
awareness
-0.06
.smtp
-0.06
POSITIVE LOGITS
Published
0.07
aves
0.07
″
0.07
}_{0.06
Override
0.06
atıcı
0.06
offline
0.06
_Renderer
0.06
Contained
0.06
Investig
0.06
Activations Density 0.019%