INDEX
    Explanations

    phrases indicating a command or instruction

    negative imperatives or prohibition phrases

    New Auto-Interp
    Negative Logits
     ancest
    -0.72
    CVE
    -0.63
     Reloaded
    -0.62
     Frie
    -0.61
    milo
    -0.56
     behavi
    -0.56
    parser
    -0.56
    gnu
    -0.55
    wrapper
    -0.55
     spirited
    -0.54
    POSITIVE LOGITS
     hesitate
    0.90
     forget
    0.89
     bother
    0.87
     expect
    0.86
    Í
    0.85
    intend
    0.81
     necessarily
    0.80
     CARE
    0.80
    ude
    0.80
    erest
    0.77
    Act Density 0.057%

    No Known Activations