INDEX
    Explanations

    violent actions or physical confrontations

    New Auto-Interp
    Negative Logits
    lr
    -0.17
    psilon
    -0.16
     inc
    -0.16
     neck
    -0.15
     ži
    -0.14
    atti
    -0.14
     throat
    -0.14
    esh
    -0.14
    ih
    -0.14
     Bols
    -0.14
    POSITIVE LOGITS
     against
    0.49
    against
    0.46
    Against
    0.45
     Against
    0.44
     tegen
    0.29
     gegen
    0.27
     contre
    0.27
    对
    0.26
     HARD
    0.24
     пÑĢоÑĤи
    0.24
    Act Density 0.018%

    No Known Activations