INDEX
Explanations
references to meal-related content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.25
1.5%
93
+0.11
0.6%
376
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
93
+0.25
0.01
361
+0.11
0.01
241
+0.11
0.01
Negative Logits
ĥ½
-3.61
<|outofrange|>
-3.55
-3.55
<|outofrange|>
-3.55
-3.55
↵
-3.55
↵↵
-3.55
<|outofrange|>
-3.55
č↵
-3.55
↵
-3.55
POSITIVE LOGITS
heet
1.75
ante
1.74
argument
1.54
nier
1.50
"};
1.46
uit
1.46
pun
1.41
ettle
1.41
etheless
1.39
1.39
Activations Density 0.021%