INDEX
    Explanations

    instances of physical conflict or violence

    New Auto-Interp
    Negative Logits
    inki
    -0.18
     pornos
    -0.15
    ìłĿ
    -0.15
    æŀª
    -0.15
    incy
    -0.15
     ÑĩеÑĢв
    -0.15
     unfavor
    -0.14
    iddi
    -0.14
    ãİ
    -0.14
    λοÏį
    -0.14
    POSITIVE LOGITS
     fist
    0.30
     fists
    0.27
     punches
    0.27
     punch
    0.26
     physical
    0.26
     boxing
    0.25
     violence
    0.24
     punching
    0.24
     physically
    0.24
     punched
    0.23
    Act Density 0.151%

    No Known Activations