INDEX
Explanations
references to social and racial issues, particularly concerning white privilege and reparations
This neuron appears to be detecting text from diverse contexts (legal documents, political commentary, shopping forums, educational content) without a clear coherent pattern, suggesting it may be misfiring or detecting a spurious correlation rather than identifying a meaningful linguistic feature.
New Auto-Interp
Negative Logits
rumahnya
-0.35
Wassers
-0.35
ibunya
-0.34
istrinya
-0.33
ferner
-0.33
Njema
-0.31
alguno
-0.31
quelcon
-0.30
hujan
-0.30
nämlich
-0.30
POSITIVE LOGITS
MigrationBuilder
1.19
betweenstory
0.97
WebElementEntity
0.92
című
0.91
tagHelperRunner
0.90
Autoritní
0.84
للمعارف
0.81
становника
0.80
ब्रेकडाउन
0.79
zwiſchen
0.77
Activations Density 2.203%