INDEX
Explanations
references to historical events or figures, especially related to war or film history
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.16
0.5%
50
+0.14
0.4%
304
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.16
0.04
50
+0.14
0.04
753
+0.13
0.01
Negative Logits
<bos>
-0.79
flanges
-0.75
ductile
-0.72
shewn
-0.69
exchangers
-0.65
FORMANCE
-0.65
sprigs
-0.65
nutella
-0.64
parenchyma
-0.64
bituminous
-0.63
POSITIVE LOGITS
confé
1.27
Sén
1.14
vété
1.10
flé
1.05
délib
1.02
clô
0.98
prédé
0.98
habile
0.98
Gouvernement
0.97
fameux
0.96
Activations Density 0.259%