INDEX
Explanations
The neuron detects mentions of skin‐related descriptors—especially the words “skin,” “tone,” or “type.”
New Auto-Interp
Negative Logits
wildfires
-0.07
oter
-0.07
ignant
-0.07
Adult
-0.07
volunteering
-0.07
ोर
-0.07
conservatives
-0.06
volunteer
-0.06
celebrity
-0.06
edu
-0.06
POSITIVE LOGITS
=__
0.07
‐‐
0.06
nombre
0.06
'/>↵
0.06
-CN
0.06
학생
0.06
hlub
0.06
/ay
0.06
CGFloat
0.06
витами
0.06
Activations Density 0.056%