INDEX
Explanations
numerical representations or counts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.11
0.6%
177
+0.11
0.6%
11
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
51
+0.11
0.06
325
+0.11
0.06
351
+0.11
0.06
Negative Logits
erate
-1.71
bour
-1.66
ifications
-1.51
pires
-1.50
poons
-1.49
holders
-1.46
iner
-1.46
ine
-1.43
ingale
-1.41
ants
-1.39
POSITIVE LOGITS
¸
1.88
aho
1.86
ī
1.82
«
1.74
į
1.64
asion
1.61
0000000
1.60
st
1.57
âģĦ
1.54
RY
1.54
Activations Density 0.226%