INDEX
Explanations
The neuron activates on words and phrases related to diversity, equity, empowerment, inclusion, and other social-values principles.
New Auto-Interp
Negative Logits
_coverage
-0.07
Hair
-0.07
слід
-0.06
노
-0.06
_bundle
-0.06
topo
-0.06
_users
-0.06
_Level
-0.06
setName
-0.06
Swan
-0.06
POSITIVE LOGITS
ائی
0.07
_PAD
0.07
ASN
0.07
그러
0.07
Ğ
0.06
reserved
0.06
particularly
0.06
.VK
0.06
тен
0.06
-Col
0.06
Activations Density 0.057%