INDEX
    Explanations

    expressions related to sociopolitical opinions and discourse

    Contradictory or disagreeable statements

    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.77
     հղումներ
    -0.70
    rungsseite
    -0.67
    rrggbb
    -0.66
     Picchu
    -0.65
    Попис
    -0.65
    Autoritní
    -0.65
    bcryptjs
    -0.64
    нгред
    -0.62
     propOrder
    -0.62
    POSITIVE LOGITS
     I
    0.38
    0.35
     blind
    0.35
     who
    0.31
     blinded
    0.30
     racist
    0.29
     this
    0.29
     his
    0.28
    comment
    0.28
    em
    0.28
    Act Density 0.510%

    No Known Activations