INDEX
    Explanations

    occurrences of violence and injury-related terms

    New Auto-Interp
    Negative Logits
    uien
    -0.16
    hape
    -0.15
     Sne
    -0.14
    itude
    -0.14
    idget
    -0.14
    azen
    -0.14
    retch
    -0.14
    IDGE
    -0.14
    uner
    -0.14
    inspace
    -0.14
    POSITIVE LOGITS
    èĿ
    0.14
    plevel
    0.14
     queryInterface
    0.13
    amba
    0.13
    547
    0.13
    okol
    0.13
    ảnh
    0.13
    rung
    0.13
    pend
    0.13
    celik
    0.13
    Act Density 0.051%

    No Known Activations