INDEX
Explanations
rhetorical questions or inquiries for clarification
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.14
0.8%
5
+0.14
0.7%
316
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
5
+0.14
0.02
8
+0.14
0.01
466
+0.11
0.02
Negative Logits
manship
-2.14
iances
-1.95
nikov
-1.79
arts
-1.76
ies
-1.72
ivities
-1.70
iance
-1.68
bows
-1.67
ict
-1.65
herin
-1.63
POSITIVE LOGITS
"!"
1.86
Secondly
1.60
Į
1.59
quo
1.55
qc
1.51
ooo
1.47
½
1.46
YES
1.42
??
1.42
ī
1.40
Activations Density 0.058%