INDEX
Explanations
the pronoun "they" used in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
169
+0.13
0.7%
370
+0.13
0.7%
473
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
74
+0.13
0.11
352
+0.13
0.11
143
+0.12
0.09
Negative Logits
olester
-1.55
enance
-1.41
bore
-1.36
member
-1.36
jcmm
-1.33
oker
-1.33
essel
-1.30
wide
-1.29
experienced
-1.26
mux
-1.26
POSITIVE LOGITS
ories
1.71
ités
1.51
pronounce
1.46
ureus
1.45
arynge
1.45
orie
1.44
wash
1.34
uent
1.29
gett
1.27
conclusions
1.25
Activations Density 0.142%