INDEX
Explanations
numerical values, particularly those related to percentages or ratings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.17
0.9%
106
+0.14
0.8%
369
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
458
+0.17
0.04
106
+0.14
0.03
436
+0.14
0.03
Negative Logits
NOTES
-1.49
OSS
-1.47
eur
-1.44
Providence
-1.40
owski
-1.39
League
-1.38
rail
-1.38
ielder
-1.37
responsibility
-1.35
keit
-1.35
POSITIVE LOGITS
Ľ
2.25
ĭ
2.22
³
2.18
¶
2.06
Į
2.04
th
2.03
ģ
2.02
ħ
1.96
µ
1.96
ĺ
1.91
Activations Density 0.056%