INDEX
Explanations
references to events or actions that happened before a certain point in time
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
871
+0.18
0.6%
568
+0.13
0.5%
889
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
568
+0.18
0.03
871
+0.13
0.02
1059
+0.11
0.02
Negative Logits
roh
-0.45
Sikhs
-0.45
izdel
-0.43
?>">
-0.41
kõik
-0.41
SceneManagement
-0.41
najbol
-0.41
hashlib
-0.40
rehearing
-0.39
kys
-0.39
POSITIVE LOGITS
prior
0.99
prior
0.99
Prior
0.96
PRIOR
0.94
Prior
0.93
priors
0.92
PRIOR
0.82
venuto
0.79
sentito
0.78
affor
0.75
Activations Density 0.037%