INDEX
    Explanations

    attends to the last character in a token from a nearby token marked with specific labels

    New Auto-Interp
    Head Attr Weights
    0:0.13
    1:0.14
    2:0.08
    3:0.13
    4:0.13
    5:0.15
    6:0.11
    7:0.10
    Negative Logits
    ρώ
    -0.37
    Manbalar
    -0.37
     FIR
    -0.34
     foncé
    -0.34
    FIR
    -0.34
     pubblici
    -0.34
     européens
    -0.33
    дь
    -0.33
    OB
    -0.33
    ışık
    -0.33
    POSITIVE LOGITS
    AddHtmlAttribute
    0.37
    expandindo
    0.36
    tonode
    0.35
     فريبيس
    0.34
    nestjs
    0.33
    Datuak
    0.33
     estimés
    0.33
    ParallelGroup
    0.32
     ErrIntOverflow
    0.32
    новниш
    0.31
    Act Density 0.003%

    No Known Activations