INDEX
    Explanations

    expressions of gratitude or requests for assistance

    expressions of desire or preference

    New Auto-Interp
    Negative Logits
    VERTISEMENT
    -0.79
    ccording
    -0.63
    angular
    -0.63
    onut
    -0.62
    arding
    -0.59
    ulty
    -0.58
     Hazard
    -0.58
    ilian
    -0.57
    icol
    -0.56
    ious
    -0.56
    POSITIVE LOGITS
     to
    0.71
     clarification
    0.70
     revenge
    0.69
    lier
    0.67
     assurances
    0.67
    lihood
    0.63
    ANY
    0.60
     replicate
    0.59
    ĸļ
    0.59
     someone
    0.58
    Act Density 0.033%

    No Known Activations