INDEX
Explanations
code snippets or programming-related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.18
0.5%
2034
+0.16
0.5%
1699
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1169
+0.18
0.03
1216
+0.16
0.02
876
+0.12
0.00
Negative Logits
ujedno
-1.08
katastro
-0.91
Punj
-0.87
empêche
-0.84
Czechos
-0.84
conflic
-0.83
dépasse
-0.82
entraîne
-0.82
kompres
-0.81
konflik
-0.80
POSITIVE LOGITS
uteurs
0.93
Chinois
0.89
COOKIE
0.86
lapin
0.82
espé
0.79
mignon
0.77
veau
0.77
auguri
0.77
coq
0.77
{;0.77
Activations Density 0.078%