INDEX
    Explanations

    terms that express certainty or emphasis in a statement

    phrases and terms related to discrimination or prejudice against particular groups

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.03
    2:0.18
    3:0.07
    4:0.06
    5:0.10
    6:0.03
    7:0.02
    8:0.10
    9:0.24
    10:0.04
    11:0.03
    Negative Logits
    amins
    -1.50
    Jr
    -1.32
    Mich
    -1.30
    ゼウス
    -1.29
    inar
    -1.25
     Fra
    -1.25
    usercontent
    -1.24
     Bett
    -1.23
    Calif
    -1.22
     resil
    -1.21
    POSITIVE LOGITS
     clauses
    1.34
     binding
    1.30
     revised
    1.29
     memor
    1.27
     borrowing
    1.22
     derivatives
    1.21
    iths
    1.16
     transfer
    1.15
     ticking
    1.14
     merged
    1.13
    Act Density 0.006%

    No Known Activations