INDEX
Explanations
references to different file formats
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
77
+0.14
0.8%
376
+0.12
0.7%
156
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
501
+0.14
0.01
77
+0.12
0.01
9
+0.12
0.01
Negative Logits
jours
-1.51
itely
-1.48
unts
-1.44
§
-1.36
spite
-1.35
their
-1.35
vain
-1.33
wonders
-1.33
---|---
-1.32
their
-1.31
POSITIVE LOGITS
mith
1.78
ium
1.78
horizon
1.71
creen
1.69
ricular
1.69
chool
1.67
ensor
1.63
helf
1.60
icum
1.60
etting
1.58
Activations Density 0.015%