INDEX
    Explanations

    attends to the same or similar tokens from preceding different tokens

    New Auto-Interp
    Head Attr Weights
    0:0.05
    1:0.09
    2:0.07
    3:0.12
    4:0.44
    5:0.05
    6:0.07
    7:0.06
    Negative Logits
    ști
    -0.26
     Strickland
    -0.24
     thước
    -0.23
     vertes
    -0.23
     sk
    -0.23
    b
    -0.22
     jokingly
    -0.22
     kế
    -0.21
     justement
    -0.21
     vode
    -0.21
    POSITIVE LOGITS
    SequentialGroup
    0.37
     ostavi
    0.34
    StoryboardSegue
    0.34
     '\\;'
    0.32
     GenerationType
    0.31
     المعيارى
    0.31
     Réponses
    0.31
    +:+
    0.30
    Aiheesta
    0.30
    rrggbb
    0.29
    Act Density 0.423%

    No Known Activations