INDEX
Explanations
numerical values or counts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
69
+0.15
0.8%
81
+0.13
0.8%
458
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
409
+0.15
0.04
241
+0.13
0.04
458
+0.13
0.04
Negative Logits
yours
-1.62
?).
-1.52
ity
-1.48
...](
-1.46
onia
-1.44
late
-1.42
thon
-1.41
nobody
-1.41
haps
-1.40
OutputStream
-1.37
POSITIVE LOGITS
bed
1.58
itars
1.47
teenth
1.42
ycin
1.40
zerba
1.40
ele
1.38
abad
1.37
cale
1.37
iate
1.33
ultan
1.32
Activations Density 0.031%