INDEX
Explanations
descriptions of storytelling and narrative elements in text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.14
0.5%
1363
+0.12
0.5%
1677
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.14
0.02
1435
+0.12
0.02
1895
+0.11
0.02
Negative Logits
تضيفلها
-0.52
UpInside
-0.50
sho
-0.49
fficients
-0.47
Diweddarwch
-0.46
Health
-0.46
mpto
-0.46
Mucha
-0.45
dollis
-0.45
stylers
-0.44
POSITIVE LOGITS
narrative
1.19
increa
1.17
Narrative
1.15
NARR
1.11
Narrative
1.11
narrative
1.09
narratives
1.07
scrat
1.06
impra
1.04
tolerably
1.04
Activations Density 0.084%