INDEX
Explanations
names of individuals, particularly women
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.21
1.2%
503
+0.15
0.8%
369
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.21
0.13
503
+0.15
0.08
369
+0.14
0.04
Negative Logits
etc
-1.75
erm
-1.52
rac
-1.47
stitutional
-1.47
legraph
-1.46
ram
-1.44
idential
-1.44
unto
-1.41
"/>
-1.41
onal
-1.40
POSITIVE LOGITS
herself
2.99
Mae
1.92
udeau
1.76
issance
1.76
Louise
1.70
she
1.69
beth
1.66
gigg
1.65
Rice
1.65
Jane
1.61
Activations Density 0.821%