INDEX
Explanations
section headers or formatting indicators in documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
115
+0.14
0.8%
129
+0.13
0.7%
250
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
48
+0.14
0.01
261
+0.13
0.01
390
+0.11
0.01
Negative Logits
.^[@
-1.50
)^[@
-1.49
chell
-1.48
winner
-1.45
neb
-1.44
^[@
-1.39
zin
-1.39
CBC
-1.38
"/
-1.36
ago
-1.35
POSITIVE LOGITS
ĻĤ
1.59
udes
1.52
iberal
1.48
ington
1.47
regards
1.45
andals
1.45
gate
1.43
notes
1.41
hora
1.39
nabla
1.39
Activations Density 0.010%