INDEX
Explanations
the word "one" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
271
+0.13
0.7%
190
+0.12
0.7%
307
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
401
+0.13
0.05
404
+0.12
0.04
307
+0.12
0.03
Negative Logits
lications
-1.84
))?
-1.80
"))
-1.74
))\
-1.58
terday
-1.55
'?"
-1.55
)))
-1.54
)$)
-1.53
"?
-1.52
))=
-1.51
POSITIVE LOGITS
↵
1.50
1.50
1.50
č↵
1.50
↵
1.50
↵
1.50
↵
1.50
1.50
↵
1.50
<|outofrange|>
1.50
Activations Density 0.152%