INDEX
Explanations
conjunctions that express contrast or opposition
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
273
+0.14
0.8%
242
+0.11
0.6%
67
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
500
+0.14
0.01
274
+0.11
0.01
405
+0.10
0.01
Negative Logits
!”
-1.79
!’
-1.76
!),
-1.65
↵
-1.57
<|outofrange|>
-1.57
↵↵
-1.57
↵
-1.57
-1.57
↵ âĢĥ
-1.57
↵
-1.57
POSITIVE LOGITS
inine
1.84
onset
1.57
WHM
1.52
\%
1.50
ondo
1.47
elic
1.45
CLUSION
1.43
protease
1.42
ortium
1.41
ausing
1.37
Activations Density 0.001%