INDEX
    Explanations

    phrases indicating surprise or strong emphasis

    phrases indicating negation or denial

    New Auto-Interp
    Negative Logits
    RAFT
    -0.70
    roxy
    -0.63
    ULTS
    -0.61
    ousand
    -0.60
    Posts
    -0.60
     Comes
    -0.59
    inese
    -0.57
    rox
    -0.56
    perse
    -0.56
    CENT
    -0.56
    POSITIVE LOGITS
    xious
    1.23
     longer
    1.17
    ct
    0.98
    except
    0.91
     doubt
    0.91
     matter
    0.84
     exception
    0.84
     indication
    0.83
    otrop
    0.82
     exaggeration
    0.77
    Act Density 0.044%

    No Known Activations