INDEX
    Explanations

    questions starting with "Why don't" or similar phrasings

    negations or questions that challenge the status quo

    New Auto-Interp
    Negative Logits
    ORGE
    -0.83
    urst
    -0.79
    isers
    -0.73
    onyms
    -0.71
    soType
    -0.71
    EV
    -0.70
    arov
    -0.70
    ocene
    -0.70
    quickShipAvailable
    -0.69
     Starts
    -0.69
    POSITIVE LOGITS
     properly
    0.83
     adequately
    0.81
     vacc
    0.73
    itia
    0.69
     mention
    0.67
    icable
    0.66
     reinvest
    0.65
     assimil
    0.65
     bother
    0.65
     priorit
    0.64
    Act Density 0.182%

    No Known Activations