INDEX
    Explanations

    phrases related to negative attitudes or disrespect towards individuals or institutions

    expressions of contempt and mistrust

    New Auto-Interp
    Negative Logits
     Lans
    -0.67
     hemor
    -0.63
     encyclopedia
    -0.63
     advoc
    -0.63
     Som
    -0.62
     stabilization
    -0.61
     Explan
    -0.61
     Publishers
    -0.60
     Rum
    -0.58
     misunder
    -0.58
    POSITIVE LOGITS
    uous
    1.54
    uously
    1.42
    ible
    1.04
    ful
    1.03
    fully
    0.96
    urous
    0.95
    ibly
    0.94
    FUL
    0.93
    orable
    0.92
    ensible
    0.92
    Act Density 0.097%

    No Known Activations