INDEX
Explanations
interrogative sentences that pose questions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
302
+0.14
0.8%
419
+0.13
0.7%
138
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
302
+0.14
0.06
16
+0.13
0.05
390
+0.13
0.03
Negative Logits
Ļª
-2.40
ģ
-2.32
ļ
-2.10
Ĵ
-2.02
ĸ
-1.98
¼
-1.90
ı
-1.89
¨
-1.88
ŀ
-1.87
į
-1.86
POSITIVE LOGITS
questions
2.37
questions
2.21
answered
2.19
aloud
2.05
answered
2.02
Answer
1.90
answer
1.89
Question
1.89
asking
1.88
answering
1.80
Activations Density 1.111%