INDEX
Explanations
the pronoun 'it' mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1438
+0.14
0.5%
25
+0.12
0.4%
1984
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.14
0.06
1438
+0.12
0.05
101
+0.11
0.04
Negative Logits
?...
-1.27
ftu
-1.17
NOO
-1.15
··
-1.15
squa
-1.14
»>
-1.13
!...
-1.13
aen
-1.13
.-"
-1.12
fte
-1.12
POSITIVE LOGITS
FetchType
0.69
it
0.66
GARET
0.66
It
0.65
It
0.65
MMdd
0.61
มัน
0.61
它
0.60
.
0.60
them
0.60
Activations Density 0.211%