INDEX
    Explanations

    words related to criticism or negative evaluation

    terms related to significant consequences or effects

    New Auto-Interp
    Negative Logits
     Awakens
    -0.68
     Kardash
    -0.66
     Masquerade
    -0.57
     fitt
    -0.54
     trusts
    -0.52
     didnt
    -0.51
     Kenn
    -0.51
     retrie
    -0.50
     Wes
    -0.49
     Mak
    -0.49
    POSITIVE LOGITS
    lie
    0.72
    maxwell
    0.68
    ieu
    0.68
    JECT
    0.62
    onite
    0.61
    olina
    0.61
    ril
    0.61
    oton
    0.59
    acus
    0.58
    inguished
    0.58
    Act Density 1.571%

    No Known Activations