INDEX
    Explanations

    phrases indicating willingness or acceptance

    positive emotional states or expressions of happiness

    New Auto-Interp
    Negative Logits
     disproportion
    -0.73
     GOODMAN
    -0.68
    oons
    -0.67
    Posts
    -0.66
    senal
    -0.64
    hur
    -0.64
     Relief
    -0.64
    arnaev
    -0.63
    ulz
    -0.61
    anan
    -0.61
    POSITIVE LOGITS
     embraced
    0.88
     awaiting
    0.81
     accepted
    0.80
     awaited
    0.78
     complied
    0.78
     transitioned
    0.77
     supplied
    0.76
     acknowledged
    0.76
     parted
    0.75
     welcomed
    0.75
    Act Density 0.097%

    No Known Activations