INDEX
Explanations
phrases related to names starting with "Da."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1271
+0.18
0.8%
1023
+0.13
0.5%
1896
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1271
+0.18
0.03
1950
+0.13
0.02
1141
+0.13
0.02
Negative Logits
perciò
-0.59
accanto
-0.59
ovunque
-0.56
poichè
-0.53
pertanto
-0.53
altrimenti
-0.50
indietro
-0.50
adesso
-0.49
ovviamente
-0.49
Quindi
-0.48
POSITIVE LOGITS
Da
1.09
Da
1.06
DA
1.02
da
0.94
DA
0.85
da
0.81
Dahl
0.79
Dal
0.72
grati
0.70
Dal
0.68
Activations Density 0.119%