INDEX
Explanations
instances of identifiers and IDs within the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.21
1.2%
23
+0.14
0.8%
116
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
23
+0.21
0.02
56
+0.14
0.03
458
+0.11
0.02
Negative Logits
ĻĤ
-1.81
gro
-1.59
quart
-1.59
ÃĤ
-1.57
↵ âĢĥ
-1.57
↵
-1.57
↵
-1.57
-1.57
↵↵↵
-1.57
↵↵
-1.57
POSITIVE LOGITS
necrosis
1.50
stripes
1.50
ycin
1.43
udeau
1.40
ctica
1.38
ivated
1.35
datetime
1.33
ubot
1.33
Providence
1.31
ives
1.27
Activations Density 0.227%