INDEX
Explanations
references to fairness or equitable practices
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
443
+0.14
0.8%
376
+0.14
0.8%
503
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
443
+0.14
0.02
503
+0.14
0.02
286
+0.13
0.01
Negative Logits
other
-1.67
maternal
-1.46
its
-1.45
latest
-1.42
elevated
-1.40
esters
-1.38
plasia
-1.38
elderly
-1.38
beautiful
-1.37
previous
-1.37
POSITIVE LOGITS
fax
2.01
banks
1.94
uet
1.85
opan
1.82
manship
1.71
cloth
1.71
grounds
1.70
gate
1.70
leigh
1.70
piece
1.64
Activations Density 0.018%