INDEX
Explanations
references to "sacrifice."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
362
+0.13
0.7%
111
+0.13
0.7%
148
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.13
0.01
448
+0.13
0.01
400
+0.12
0.01
Negative Logits
ĨĴ
-2.56
Ķ
-2.13
Ĭ
-2.06
ĵ
-2.01
happier
-2.01
ı
-1.97
ľĵ
-1.91
Ļª
-1.90
ľ
-1.80
¸
-1.68
POSITIVE LOGITS
ramento
2.23
erd
2.02
lemental
1.88
ilage
1.86
holder
1.85
idopsis
1.81
othet
1.79
pere
1.79
ceedings
1.70
ULAR
1.63
Activations Density 0.016%