INDEX
    Explanations

    instances where one option or action is preferred over another

    the repeated use of the word "instead."

    New Auto-Interp
    Negative Logits
    cision
    -0.72
    ongo
    -0.71
    mud
    -0.67
    fried
    -0.65
    lees
    -0.64
    raz
    -0.64
    anon
    -0.64
    minent
    -0.64
     Shake
    -0.63
    rament
    -0.63
    POSITIVE LOGITS
     opting
    0.88
     instead
    0.84
    instead
    0.80
     preferring
    0.70
     chose
    0.70
     opt
    0.68
     cannabin
    0.68
     opted
    0.67
     passively
    0.67
    artments
    0.67
    Act Density 0.020%

    No Known Activations