INDEX
    Explanations

    phrases indicating doubt or possibility

    phrases that suggest uncertainty or speculation

    New Auto-Interp
    Negative Logits
    lins
    -0.78
    waters
    -0.75
    raint
    -0.74
    board
    -0.73
    eries
    -0.72
    elson
    -0.71
    rix
    -0.70
    kowski
    -0.70
    raged
    -0.69
    ioch
    -0.69
    POSITIVE LOGITS
     misunder
    0.87
    querque
    0.84
     jeopard
    0.80
    interstitial
    0.79
     forgiven
    0.75
    merce
    0.74
     infer
    0.73
    uthor
    0.72
     surv
    0.71
     swayed
    0.70
    Act Density 0.017%

    No Known Activations