INDEX
    Explanations

    phrases where someone is being labeled or called a negative term

    instances of negative labels and accusations directed at individuals

    New Auto-Interp
    Negative Logits
    intent
    -0.71
    Instruct
    -0.71
    imates
    -0.67
     adjoining
    -0.65
    ANS
    -0.63
    aday
    -0.63
     appl
    -0.63
    grounds
    -0.62
    Edit
    -0.62
    MN
    -0.62
    POSITIVE LOGITS
     hoax
    1.00
     "
    0.91
     liar
    0.88
     "'
    0.82
     nuisance
    0.80
     miracle
    0.78
    versive
    0.77
     typo
    0.77
     fraud
    0.76
     '
    0.75
    Act Density 0.141%

    No Known Activations