INDEX
    Explanations

    expressions of opinions or reactions

    expressions of feelings related to satisfaction or dissatisfaction

    New Auto-Interp
    Negative Logits
     cheat
    -0.70
     dated
    -0.67
    amen
    -0.65
    allow
    -0.62
    Ranked
    -0.60
    allows
    -0.60
    ramid
    -0.59
    hang
    -0.59
    oret
    -0.58
    igham
    -0.58
    POSITIVE LOGITS
     aback
    0.91
    ragon
    0.71
    dy
    0.69
     hearing
    0.69
     by
    0.68
     seeing
    0.65
     Hearing
    0.65
    ienced
    0.64
     about
    0.63
     citiz
    0.63
    Act Density 0.150%

    No Known Activations