INDEX
Explanations
mentions of specific actions or notable events in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.17
0.5%
1150
+0.16
0.5%
906
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
872
+0.17
0.07
1150
+0.16
0.02
776
+0.11
0.06
Negative Logits
bourg
-0.76
Shakspeare
-0.76
Whence
-0.74
noyau
-0.74
carrefour
-0.73
ekos
-0.71
delà
-0.68
xxvi
-0.67
Chapitre
-0.67
kooper
-0.66
POSITIVE LOGITS
paragraph
0.71
section
0.70
listing
0.70
listed
0.66
mention
0.66
mentions
0.62
page
0.61
paragraphs
0.61
description
0.58
wording
0.57
Activations Density 0.661%