INDEX
    Explanations

    phrases related to controversial statements or actions made by public figures

    instances of names and titles associated with accusations or negative labels

    New Auto-Interp
    Negative Logits
    icion
    -0.80
    enture
    -0.69
    itely
    -0.68
    regate
    -0.68
    ovember
    -0.63
    pta
    -0.62
    few
    -0.61
    ptoms
    -0.61
    éĥ
    -0.61
    gomery
    -0.61
    POSITIVE LOGITS
     unacceptable
    1.08
     unreliable
    0.99
     irresponsible
    0.99
     unfit
    0.98
     unsu
    0.97
     obsolete
    0.97
     unworthy
    0.95
     unethical
    0.95
     illegitimate
    0.92
     "'
    0.91
    Act Density 0.141%

    No Known Activations