INDEX
Explanations
text describing notable or remarkable features or observations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
411
+0.12
0.4%
663
+0.11
0.4%
168
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1793
+0.12
0.03
411
+0.11
0.03
25
+0.10
0.03
Negative Logits
település
-0.53
Demografie
-0.51
Produzione
-0.50
MBC
-0.50
攷
-0.48
Nuorodos
-0.47
Šaltiniai
-0.46
Explicación
-0.45
Bé
-0.45
Organisateur
-0.44
POSITIVE LOGITS
notable
0.78
Noice
0.77
Notable
0.76
noteworthy
0.67
maneu
0.65
vogli
0.65
voleva
0.64
nutella
0.63
Notably
0.63
ecru
0.63
Activations Density 0.123%