INDEX
    Explanations

    verbal commands or instructions starting with "don't."

    negations or expressions of inability

    New Auto-Interp
    Negative Logits
     behavi
    -0.80
     tremend
    -0.77
    EStream
    -0.76
     mosqu
    -0.75
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.73
     gorilla
    -0.69
     intern
    -0.69
     cannabin
    -0.68
     exha
    -0.66
     Skydragon
    -0.66
    POSITIVE LOGITS
    ween
    1.07
    aken
    1.06
    otally
    1.04
    asks
    1.04
    ruck
    1.03
    ractor
    1.02
    olkien
    1.01
    akers
    1.00
    ople
    0.97
    une
    0.97
    Act Density 0.127%

    No Known Activations