INDEX
Explanations
phrases and terms related to description and narration
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.22
1.2%
156
+0.15
0.9%
457
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
345
+0.22
0.02
364
+0.15
0.01
311
+0.14
0.01
Negative Logits
illard
-1.66
ska
-1.63
uten
-1.59
HS
-1.58
ori
-1.56
ż
-1.54
\]\].
-1.51
enue
-1.47
)]{}-1.46
\]]{}-1.42
POSITIVE LOGITS
ably
2.32
how
1.73
error
1.51
deprivation
1.45
omerase
1.40
atroc
1.38
disorder
1.36
them
1.34
quer
1.33
errors
1.33
Activations Density 0.010%