INDEX
Explanations
phrases related to assembling or fitting together different components to create something new
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.12
0.4%
876
+0.08
0.2%
609
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
297
+0.12
0.06
1615
+0.08
0.05
2044
+0.07
0.06
Negative Logits
parteci
-0.98
sappi
-0.87
purtroppo
-0.81
scopri
-0.81
poichè
-0.78
véhic
-0.77
vorrei
-0.77
prosegu
-0.77
perciò
-0.76
vogli
-0.75
POSITIVE LOGITS
together
1.04
together
0.87
TOGETHER
0.78
Together
0.73
elkaar
0.72
Together
0.71
combined
0.70
<bos>
0.69
combine
0.67
interconnected
0.66
Activations Density 0.474%