INDEX
Explanations
references to author names or citations within academic contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
1.2%
1741
+0.20
1.0%
1343
+0.17
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.23
0.05
1343
+0.20
0.04
981
+0.17
0.04
Negative Logits
<bos>
-2.17
SPECTION
-0.75
become
-0.67
aughters
-0.66
CLUSIVE
-0.66
EMPT
-0.66
seriously
-0.64
PLANATION
-0.63
awakeFromNib
-0.63
brought
-0.63
POSITIVE LOGITS
accla
1.75
Strukt
1.64
Kategor
1.62
„,
1.54
aen
1.50
§.
1.48
inev
1.45
effe
1.45
increa
1.42
nece
1.41
Activations Density 0.108%