INDEX
Explanations
words related to criticism, evaluation, information sharing, and knowledge
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1499
+0.15
0.5%
876
+0.08
0.2%
401
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.15
0.03
666
+0.08
0.02
1593
+0.08
0.02
Negative Logits
(
-0.83
.
-0.82
-0.82
,
-0.80
↵↵
-0.80
in
-0.80
↵
-0.78
;
-0.74
.
-0.73
-
-0.73
POSITIVE LOGITS
milano
1.95
marcato
1.89
dispen
1.88
tremb
1.83
pessi
1.82
nutr
1.82
ritard
1.82
doman
1.82
igno
1.80
napoli
1.79
Activations Density 0.074%