INDEX
    Explanations

    expressions of gratitude or thanks

    phrases that express a desire or preference

    New Auto-Interp
    Negative Logits
    VERTISEMENT
    -0.80
    angular
    -0.71
    ingen
    -0.69
    icol
    -0.69
    ulty
    -0.66
    onut
    -0.65
    aan
    -0.62
    shock
    -0.62
    @#&
    -0.61
    illusion
    -0.61
    POSITIVE LOGITS
    lier
    0.82
     revenge
    0.75
     assurances
    0.69
     to
    0.67
    lihood
    0.66
     forgiveness
    0.65
    fully
    0.63
    ANY
    0.61
     redress
    0.61
    ably
    0.61
    Act Density 0.024%

    No Known Activations