INDEX
    Explanations

    imperatives or instructions

    directives or suggestions for action

    New Auto-Interp
    Negative Logits
    edge
    -0.76
    emale
    -0.70
    "]=>
    -0.63
    uton
    -0.63
    essions
    -0.60
    hell
    -0.60
    ieval
    -0.60
    ungle
    -0.59
    apo
    -0.59
    ranch
    -0.59
    POSITIVE LOGITS
     yourselves
    1.28
     yourself
    1.23
     Yourself
    0.89
     your
    0.84
     wisely
    0.84
     ye
    0.81
     sparing
    0.81
    cknow
    0.78
     thy
    0.74
     carefully
    0.73
    Act Density 0.249%

    No Known Activations