INDEX
Explanations
attends to an idea presented earlier in the sequence from a later token
New Auto-Interp
Head Attr Weights
0:0.09
1:0.11
2:0.12
3:0.19
4:0.12
5:0.03
6:0.16
7:0.15
Negative Logits
contentLoaded
-0.31
ng
-0.28
Référence
-0.27
herin
-0.27
jena
-0.27
lij
-0.26
kal
-0.26
Initially
-0.26
estimés
-0.26
TestBed
-0.25
POSITIVE LOGITS
ConstraintMaker
0.36
NUMX
0.35
ślę
0.33
GenerationType
0.31
ofür
0.31
ſted
0.30
+#+#
0.29
bibinfo
0.29
ksesta
0.29
ientôt
0.29
Activations Density 0.237%