INDEX
    Explanations

    instances where an action is taken alongside an alternative contrasting action

    instances of the word "also."

    New Auto-Interp
    Negative Logits
    Wr
    -0.74
    anon
    -0.74
    USD
    -0.70
    crow
    -0.69
    jam
    -0.67
    ichen
    -0.65
    atre
    -0.65
    tten
    -0.65
    ongyang
    -0.65
    UD
    -0.64
    POSITIVE LOGITS
     cautioned
    0.82
     incorporates
    0.81
     optionally
    0.81
     includes
    0.81
     occasionally
    0.77
     expressed
    0.75
     risked
    0.74
     encouraged
    0.74
     hinted
    0.73
     acted
    0.73
    Act Density 0.063%

    No Known Activations