INDEX
Explanations
information related to historical events and specific details of objects or concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.17
0.5%
1510
+0.14
0.4%
630
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
630
+0.17
0.02
580
+0.14
0.02
163
+0.13
0.02
Negative Logits
astéro
-0.82
frankfurt
-0.81
Eksteraj
-0.79
">...
-0.78
Redacción
-0.75
Răsp
-0.74
Glej
-0.73
Autoritní
-0.73
Переваги
-0.73
Și
-0.72
POSITIVE LOGITS
).
0.64
)
0.63
}
0.61
].
0.61
);
0.60
2
0.59
↵↵
0.58
<eos>
0.58
مشين
0.58
.
0.58
Activations Density 0.039%