INDEX
    Explanations

    words related to disbelief, disgust, and disrespect

    New Auto-Interp
    Negative Logits
     on
    -0.71
     to
    -0.66
     Rome
    -0.63
     in
    -0.63
     for
    -0.62
    -0.61
     of
    -0.60
     has
    -0.59
     or
    -0.59
    ,
    -0.59
    POSITIVE LOGITS
     dises
    1.75
    fordable
    1.47
     fatis
    1.37
     hdi
    1.37
     fta
    1.37
     isuzu
    1.33
     ftu
    1.32
     dci
    1.32
     imbal
    1.32
     milano
    1.31
    Act Density 0.045%

    No Known Activations