INDEX
    Explanations

    instructions related to physical self-defense techniques or tactics

    New Auto-Interp
    Negative Logits
     fta
    -2.65
     inev
    -2.61
     emphat
    -2.60
     secon
    -2.59
     accla
    -2.58
     depic
    -2.57
     embra
    -2.55
     squa
    -2.54
     dises
    -2.53
     ftu
    -2.50
    POSITIVE LOGITS
     try
    1.17
     don
    1.11
     you
    1.08
     please
    1.08
     consider
    1.07
     remember
    1.04
     let
    1.02
     make
    1.02
     choose
    1.00
    try
    1.00
    Act Density 0.304%

    No Known Activations