INDEX
    Explanations

    references to the concept of "malice" or negative traits

    New Auto-Interp
    Negative Logits
    <bos>
    -2.43
    Географи
    -0.70
    setDo
    -0.70
    AutoScaleMode
    -0.68
    Петербург
    -0.65
     AppCompatTheme
    -0.64
    UseVisualStyle
    -0.63
    Hướng
    -0.63
    mergeFrom
    -0.62
    <tfoot>
    -0.62
    POSITIVE LOGITS
     thut
    1.72
     ftu
    1.62
     perfon
    1.60
     effe
    1.60
     reft
    1.59
     wien
    1.59
     sovere
    1.55
     stockholm
    1.53
     Augu
    1.52
     §.
    1.52
    Act Density 0.065%

    No Known Activations