INDEX
Explanations
scientific research paper references
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.17
0.5%
876
+0.13
0.4%
453
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
780
+0.17
0.02
924
+0.13
0.02
1784
+0.12
0.02
Negative Logits
unspeak
-1.25
Shakspeare
-1.18
swarovski
-1.09
indescri
-1.01
gaily
-1.01
Whence
-1.01
Juf
-1.00
unwarran
-0.98
shewn
-0.97
inconce
-0.96
POSITIVE LOGITS
<bos>
0.87
PhysRev
0.68
verkle
0.63
impon
0.62
巽
0.56
eleg
0.54
toller
0.53
Trieb
0.53
potes
0.53
journal
0.53
Activations Density 0.046%