INDEX
    Explanations

    phrases related to physical acts of aggression or impact

    references to the word "punch" in various contexts

    New Auto-Interp
    Negative Logits
    uve
    -0.92
    abeth
    -0.76
    aird
    -0.68
    icter
    -0.65
    Private
    -0.64
     Citizen
    -0.64
     Neural
    -0.64
    udic
    -0.62
    rians
    -0.61
     Archdemon
    -0.61
    POSITIVE LOGITS
    bowl
    1.25
     punches
    0.87
     punch
    0.84
    aneers
    0.80
    bag
    0.76
    istani
    0.75
    outs
    0.75
     punching
    0.74
    sticks
    0.74
    cart
    0.74
    Act Density 0.014%

    No Known Activations