INDEX
    Explanations

    attends to various punctuation or grammatical relationships among tokens across sequences

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.09
    2:0.07
    3:0.13
    4:0.18
    5:0.11
    6:0.22
    7:0.09
    Negative Logits
     betweenstory
    -0.47
    abestanden
    -0.47
     bezeichneter
    -0.46
    InvalidProtocol
    -0.44
     CreateTagHelper
    -0.43
     يتيمه
    -0.42
    UserScript
    -0.42
     дописавши
    -0.40
     Reſ
    -0.39
     ویکی‌پدیای
    -0.38
    POSITIVE LOGITS
    esinde
    0.28
    disposing
    0.26
    iParam
    0.26
    ';
    0.25
    املة
    0.25
    மை
    0.25
    esta
    0.25
    ())).
    0.24
    ")));
    0.24
     '');
    0.23
    Act Density 0.054%

    No Known Activations