INDEX
    Explanations

    words related to negation or reversal, often in the form of prefixes like "un-" or "anti-"

    words related to undignified or unrefined behavior

    New Auto-Interp
    Negative Logits
    anwhile
    -0.80
    ppo
    -0.79
    Ĥİ
    -0.75
    auga
    -0.74
    å§«
    -0.73
    tsky
    -0.73
    azines
    -0.72
    bley
    -0.72
    zzo
    -0.72
    ramid
    -0.69
    POSITIVE LOGITS
    etermined
    1.09
    oubt
    1.04
    iscovered
    1.03
    aunted
    0.99
    irect
    0.96
    ec
    0.95
    epend
    0.93
    ifferent
    0.91
    amed
    0.88
    etermin
    0.87
    Act Density 0.011%

    No Known Activations