INDEX
Explanations
descriptions of the appearance and physical attributes of objects or entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.12
0.4%
1535
+0.11
0.3%
674
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.12
0.04
1262
+0.11
0.04
732
+0.11
0.04
Negative Logits
accanto
-0.84
quegli
-0.79
seoul
-0.77
bangkok
-0.76
autunno
-0.76
virtù
-0.74
venice
-0.74
ciasc
-0.72
orlando
-0.71
alaska
-0.71
POSITIVE LOGITS
Cuen
0.64
terms
0.57
order
0.52
in
0.52
In
0.51
Lla
0.51
granada
0.50
Tercera
0.50
Perd
0.50
Río
0.50
Activations Density 0.217%