INDEX
Explanations
sections related to software licensing and documentation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
504
+0.13
0.7%
126
+0.12
0.6%
326
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
292
+0.13
0.03
126
+0.12
0.02
128
+0.11
0.01
Negative Logits
dot
-1.52
ever
-1.34
ogle
-1.33
blocking
-1.32
hack
-1.25
ctrl
-1.24
bear
-1.24
bracket
-1.22
emitter
-1.20
depression
-1.20
POSITIVE LOGITS
ittal
1.69
wagen
1.67
usal
1.53
£
1.49
].)
1.49
³
1.44
imes
1.43
mith
1.41
ues
1.40
ischen
1.39
Activations Density 0.100%