INDEX
Explanations
phrases related to challenges and problem-solving within a narrative context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.18
1.0%
274
+0.18
1.0%
118
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
274
+0.18
0.20
118
+0.18
0.21
81
+0.14
0.13
Negative Logits
Īĺ
-1.86
¦
-1.80
±
-1.71
inward
-1.65
idegger
-1.64
Ľ
-1.59
Appl
-1.58
ITED
-1.56
¬
-1.54
Comments
-1.53
POSITIVE LOGITS
ipore
1.83
rolet
1.73
$.\
1.71
£
1.71
factory
1.65
ressor
1.62
(£
1.54
abroad
1.52
caster
1.51
ielder
1.51
Activations Density 5.404%