INDEX
    Explanations

    instances where an unexpected outcome is described

    the phrase "even though" indicating contrast or concession

    New Auto-Interp
    Negative Logits
    Eye
    -0.78
    isible
    -0.65
     ru
    -0.65
    ursed
    -0.64
    umped
    -0.63
    cycl
    -0.62
    ricks
    -0.61
    aven
    -0.61
    irt
    -0.60
    Ingredients
    -0.60
    POSITIVE LOGITS
     acknowledging
    0.82
    lihood
    0.77
     deleting
    0.74
     conced
    0.72
    itals
    0.69
     admitting
    0.69
     admittedly
    0.68
    clair
    0.68
    olulu
    0.67
    anamo
    0.67
    Act Density 0.026%

    No Known Activations