INDEX
Explanations
locations and titles related to key events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1577
+0.28
0.9%
1343
+0.24
0.8%
1842
+0.24
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1842
+0.28
0.10
856
+0.24
0.08
1343
+0.24
0.10
Negative Logits
disagre
-0.94
Shakspeare
-0.93
unwarran
-0.89
unlaw
-0.84
reluct
-0.82
tolerably
-0.82
shenan
-0.82
Daven
-0.82
withal
-0.81
strick
-0.80
POSITIVE LOGITS
{}
0.63
للمعارف
0.53
fono
0.50
שוליים
0.48
mej
0.47
fromCharCode
0.47
peniten
0.46
"}")
0.46
gela
0.46
fycat
0.45
Activations Density 1.779%