INDEX
Explanations
This neuron detects legal text specifying racial eligibility or classification (e.g. references to “white persons” in statutes or rulings).
New Auto-Interp
Negative Logits
WebDriver
-0.07
Family
-0.07
füg
-0.06
унд
-0.06
investor
-0.06
add
-0.06
ieten
-0.06
Seattle
-0.06
Honda
-0.06
参照
-0.06
POSITIVE LOGITS
proving
0.06
vlan
0.06
/String
0.06
_##
0.06
vara
0.06
ธ
0.06
preserve
0.06
lifts
0.06
discrete
0.06
FF
0.06
Activations Density 0.029%