INDEX
Explanations
racial bias
This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.
Explanation could not be parsed.
Explanation could not be parsed.
New Auto-Interp
Negative Logits
leaf
-0.08
losse
-0.08
folos
-0.08
postfix
-0.07
roy
-0.07
borrowed
-0.07
topo
-0.07
אית
-0.07
overw
-0.07
ויה
-0.07
POSITIVE LOGITS
racial
0.20
racial
0.17
黑人
0.16
racism
0.16
ethnic
0.16
ethnicity
0.15
racist
0.15
minorities
0.14
multicultural
0.14
LGBTQ
0.14
Activations Density 0.291%