INDEX
Explanations
statistical and numerical differences within social issues, especially concerning racial disparities and political polling data
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.10
0.3%
1253
+0.09
0.3%
799
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
799
+0.10
0.04
1499
+0.09
0.05
525
+0.09
0.02
Negative Logits
Juf
-0.80
Ruman
-0.80
Manufact
-0.79
Guel
-0.76
philanth
-0.76
inev
-0.76
secon
-0.74
Khart
-0.72
volunte
-0.71
Sted
-0.71
POSITIVE LOGITS
difference
0.73
difference
0.69
differences
0.64
AddTagHelper
0.61
Δ
0.60
awtextra
0.58
Difference
0.57
margin
0.56
diferença
0.55
margin
0.55
Activations Density 0.364%