INDEX
Explanations
personal experiences and anecdotes mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.10
0.3%
227
+0.10
0.3%
90
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.04
862
+0.10
0.02
1404
+0.09
0.03
Negative Logits
serons
-0.57
their
-0.48
recev
-0.48
Izvori
-0.48
Chham
-0.47
savons
-0.46
faisons
-0.46
MarshalTo
-0.46
متعلقه
-0.45
łgor
-0.45
POSITIVE LOGITS
ourselves
1.13
apprehen
0.99
encomp
0.95
shenan
0.95
Souha
0.92
Messieurs
0.91
stickied
0.88
unspeak
0.88
intersper
0.86
disreg
0.85
Activations Density 0.225%