INDEX
    Explanations

    attends to injury-related tokens from explanations or clarifications present in tokens prior in the sequence

    New Auto-Interp
    Head Attr Weights
    0:0.18
    1:0.19
    2:0.17
    3:0.09
    4:0.08
    5:0.09
    6:0.05
    7:0.12
    Negative Logits
     ویکی‌پدیا
    -0.36
    rungsseite
    -0.35
    ########.
    -0.35
    Rüyada
    -0.34
     '\\;'
    -0.34
     ſte
    -0.33
     ſta
    -0.32
    HideFlags
    -0.32
    SharedCtor
    -0.32
     Вес
    -0.32
    POSITIVE LOGITS
    的她
    0.25
    tagHelperRunner
    0.24
    legd
    0.24
     ngang
    0.23
    awtextra
    0.23
     künftig
    0.23
     Kao
    0.23
    的他
    0.23
     Oviedo
    0.22
     gdyby
    0.22
    Act Density 0.075%

    No Known Activations