INDEX
Explanations
code snippets and references to technical documentation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.15
0.5%
1870
+0.11
0.4%
304
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1618
+0.15
0.05
1331
+0.11
0.05
1045
+0.11
0.04
Negative Logits
<bos>
-1.04
RTLR
-0.61
on
-0.60
.
-0.60
,
-0.58
as
-0.57
s
-0.57
in
-0.55
consolidate
-0.53
at
-0.53
POSITIVE LOGITS
keramik
1.40
alkoh
1.39
kram
1.39
antik
1.35
kosme
1.34
abnorm
1.33
silikon
1.31
robus
1.30
plak
1.30
konkre
1.29
Activations Density 0.332%