INDEX
    Explanations

    the phrase "don't" and its variants indicating a negative imperative or advice

    New Auto-Interp
    Negative Logits
    erno
    -0.18
    ffe
    -0.16
    fter
    -0.15
    ual
    -0.15
    .freeze
    -0.15
    clide
    -0.14
    šak
    -0.14
    ually
    -0.14
    intl
    -0.14
    .pivot
    -0.14
    POSITIVE LOGITS
    cel
    0.16
    indh
    0.15
    afari
    0.14
    аза
    0.14
    olith
    0.14
    ipers
    0.14
    ascus
    0.14
    itored
    0.13
    argout
    0.13
    chant
    0.13
    Act Density 0.077%

    No Known Activations