INDEX
Explanations
demographics
The neuron detects words and phrases explicitly referring to race or racial attributes (e.g., “race,” “racial,” “skin color”).
New Auto-Interp
Negative Logits
challenger
-0.08
otted
-0.07
rnd
-0.07
ุค
-0.07
-ground
-0.07
против
-0.07
Trading
-0.06
TR
-0.06
caves
-0.06
Animation
-0.06
POSITIVE LOGITS
(utf
0.07
العن
0.06
.transfer
0.06
(IServiceCollection
0.06
برگزار
0.06
-main
0.06
แห
0.06
ByteArrayInputStream
0.06
huyện
0.06
/MPL
0.06
Activations Density 0.030%