INDEX
Explanations
various forms of the word "op."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
47
+0.16
0.9%
183
+0.13
0.7%
278
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
47
+0.16
0.11
136
+0.13
-0.06
414
+0.11
0.08
Negative Logits
sworn
-1.53
sure
-1.43
remed
-1.38
then
-1.38
Develop
-1.38
locals
-1.36
Authors
-1.35
fiddle
-1.35
County
-1.34
Works
-1.33
POSITIVE LOGITS
¼
4.14
·
4.05
↵
4.00
4.00
<|outofrange|>
4.00
↵↵
4.00
↵ âĢĥ
4.00
↵↵
4.00
4.00
↵
4.00
Activations Density 2.755%