INDEX
Explanations
phrases related to processes or steps in a sequence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1937
+0.20
0.7%
50
+0.15
0.6%
478
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1937
+0.20
0.11
862
+0.15
0.06
478
+0.14
0.06
Negative Logits
silikon
-0.99
karton
-0.96
viendra
-0.95
exé
-0.95
provoque
-0.90
pama
-0.86
keramik
-0.82
susun
-0.81
kafe
-0.81
véhic
-0.81
POSITIVE LOGITS
gonna
0.67
supposed
0.66
able
0.65
considered
0.64
taken
0.64
shown
0.62
allowed
0.60
deemed
0.60
actually
0.59
going
0.59
Activations Density 0.546%