INDEX
    Explanations

    words related to negative judgments about a person's character or behavior

    derogatory terms or insults directed towards individuals

    New Auto-Interp
    Negative Logits
    undai
    -0.95
    ells
    -0.82
    earchers
    -0.80
    elve
    -0.80
    fman
    -0.80
    anmar
    -0.76
    usable
    -0.76
    idays
    -0.75
    ña
    -0.75
    jong
    -0.75
    POSITIVE LOGITS
     idiot
    0.95
     thief
    0.79
     extraord
    0.75
     idiots
    0.72
     beware
    0.72
     Investor
    0.71
     hypoc
    0.71
     liar
    0.71
     kid
    0.69
     loser
    0.68
    Act Density 0.023%

    No Known Activations