INDEX
Explanations
references to numbers or quantitative information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
538
+0.11
0.3%
381
+0.11
0.3%
204
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.11
0.07
1415
+0.11
0.04
1510
+0.10
0.04
Negative Logits
poliester
-1.01
lele
-0.89
gend
-0.89
meras
-0.87
sement
-0.86
cyr
-0.86
augus
-0.85
parati
-0.85
poliuret
-0.85
ohr
-0.82
POSITIVE LOGITS
shenan
0.95
unspeak
0.89
prolly
0.87
horrend
0.87
unavoid
0.85
intersper
0.84
encomp
0.83
indestru
0.81
needn
0.79
laug
0.79
Activations Density 0.247%