INDEX
Explanations
phrases or sentences related to a narrative or storytelling context with a specific focus on certain characters or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1356
+0.09
0.3%
1334
+0.09
0.3%
1984
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1334
+0.09
0.06
752
+0.09
0.05
1984
+0.09
0.07
Negative Logits
Autoritní
-0.60
arbox
-0.52
Jîn
-0.51
تانيه
-0.51
GraphicsUnit
-0.49
تضيفلها
-0.48
éndez
-0.46
embroidered
-0.45
AnchorStyles
-0.45
RegressionTest
-0.45
POSITIVE LOGITS
indestru
0.77
ambass
0.75
shenan
0.73
philanth
0.72
scrat
0.71
strick
0.70
concier
0.69
milf
0.69
viciss
0.69
michelin
0.69
Activations Density 0.679%