INDEX
    Explanations

    attends to tokens related to discrimination from tokens indicating specific categories of discrimination

    New Auto-Interp
    Head Attr Weights
    0:0.11
    1:0.15
    2:0.12
    3:0.11
    4:0.11
    5:0.04
    6:0.12
    7:0.19
    Negative Logits
    AddTagHelper
    -0.40
    PreferredItem
    -0.38
    addCriterion
    -0.36
    mappedBy
    -0.36
    InstrumentedTest
    -0.34
    FunctionFlags
    -0.32
     AssemblyTitle
    -0.31
     незавершена
    -0.31
    IsMutable
    -0.30
     lenker
    -0.29
    POSITIVE LOGITS
     ๆ
    0.32
     Ruß
    0.28
    modb
    0.28
    0.26
    もしれない
    0.26
    참고
    0.26
     Kommission
    0.26
    aba
    0.26
    もしれません
    0.25
     СП
    0.25
    Act Density 0.129%

    No Known Activations