INDEX
Explanations
The neuron activates on occurrences of specific racial group labels in demographic data (e.g. "White," "African American," etc.).
New Auto-Interp
Negative Logits
Kardash
-0.08
Demand
-0.07
→
-0.06
šov
-0.06
dul
-0.06
amac
-0.06
.onerror
-0.06
bdsm
-0.06
PTY
-0.06
AxisSize
-0.06
POSITIVE LOGITS
rk
0.06
цвета
0.06
(connection
0.06
پیش
0.06
ti
0.06
mia
0.06
利
0.06
neighborhood
0.06
-LAST
0.06
(token
0.06
Activations Density 0.001%