INDEX
    Explanations

    attends to an idea presented earlier in the sequence from a later token

    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.11
    2:0.12
    3:0.19
    4:0.12
    5:0.03
    6:0.16
    7:0.15
    Negative Logits
    contentLoaded
    -0.31
    ng
    -0.28
     Référence
    -0.27
    herin
    -0.27
    jena
    -0.27
    lij
    -0.26
     kal
    -0.26
    Initially
    -0.26
     estimés
    -0.26
     TestBed
    -0.25
    POSITIVE LOGITS
    ConstraintMaker
    0.36
    NUMX
    0.35
    ślę
    0.33
     GenerationType
    0.31
    ofür
    0.31
    ſted
    0.30
    +#+#
    0.29
    bibinfo
    0.29
    ksesta
    0.29
    ientôt
    0.29
    Act Density 0.237%

    No Known Activations