INDEX
    Explanations

    attends to tokens expressing a contrast or condition from related affirmatives appearing later in the sequence

    New Auto-Interp
    Head Attr Weights
    0:0.27
    1:0.21
    2:0.14
    3:0.10
    4:0.06
    5:0.02
    6:0.07
    7:0.09
    Negative Logits
    ConstraintMaker
    -0.33
    GEBURTSDATUM
    -0.29
     tartalomajánló
    -0.29
     ویکی‌پدیای
    -0.28
    Clik
    -0.28
     newOwner
    -0.28
    makeConstraints
    -0.28
     StringTokenizer
    -0.27
     Monter
    -0.27
     ModelExpression
    -0.27
    POSITIVE LOGITS
    ffet
    0.30
    énie
    0.30
    pters
    0.30
     Italijani
    0.29
    varande
    0.28
     Réponses
    0.28
    ieu
    0.27
    textAppearance
    0.27
    odly
    0.27
    GeneratedCode
    0.26
    Act Density 0.626%

    No Known Activations