INDEX
Explanations
references to large-scale systems or operations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
45
+0.17
1.0%
198
+0.12
0.7%
327
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
119
+0.17
0.03
45
+0.12
0.03
14
+0.12
0.04
Negative Logits
rile
-1.90
etine
-1.56
ilde
-1.52
ETHOD
-1.51
ÑĥÑĩ
-1.48
prompt
-1.48
riev
-1.47
dawn
-1.46
ICES
-1.45
ierno
-1.44
POSITIVE LOGITS
(\>
2.60
amounts
2.36
(âī¥
2.29
(>
2.29
(\~
2.17
enough
2.11
sized
2.10
proportions
2.06
quantities
2.05
amount
1.95
Activations Density 0.099%