INDEX
Explanations
the command "ignore" in various contexts, indicating a focus on disregarding or bypassing certain information or messages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
354
+0.13
0.7%
369
+0.12
0.7%
281
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.13
0.04
59
+0.12
0.03
332
+0.12
0.03
Negative Logits
footsteps
-1.93
¿½
-1.91
izes
-1.79
č↵
-1.78
↵ ↵
-1.78
<|outofrange|>
-1.78
<|outofrange|>
-1.78
↵↵
-1.78
↵
-1.78
<|padding|>
-1.78
POSITIVE LOGITS
\.
1.87
uer
1.81
\)
1.80
ULT
1.62
downloaded
1.58
uet
1.58
cke
1.56
uber
1.50
reland
1.50
ienn
1.48
Activations Density 0.061%