INDEX
    Explanations

    phrases indicating a choice or decision point

    phrases indicating choice or opinion, specifically contrasting options

    New Auto-Interp
    Negative Logits
    ourses
    -0.66
    inational
    -0.66
    xus
    -0.66
    vier
    -0.64
    marked
    -0.62
    urate
    -0.61
    umper
    -0.61
    runner
    -0.60
    ipal
    -0.60
    uph
    -0.59
    POSITIVE LOGITS
    lando
    0.88
    acle
    0.81
     hate
    0.75
    acles
    0.74
    Else
    0.73
     Bust
    0.73
     not
    0.72
    hate
    0.70
     starve
    0.70
     lose
    0.69
    Act Density 0.052%

    No Known Activations