INDEX
    Explanations

    negative statements or criticisms

    negations and phrases expressing doubt or lack of certainty

    New Auto-Interp
    Negative Logits
    olon
    -0.73
    lique
    -0.73
    cerpt
    -0.68
    ali
    -0.67
    ription
    -0.66
    lesi
    -0.65
    ourse
    -0.65
    lav
    -0.65
     Lux
    -0.65
    andan
    -0.64
    POSITIVE LOGITS
     darn
    0.80
    iga
    0.74
    IFT
    0.68
    Cola
    0.66
    hin
    0.64
    CO
    0.63
    Sad
    0.62
     divorced
    0.62
    WHERE
    0.62
    ifiable
    0.62
    Act Density 0.092%

    No Known Activations