INDEX
    Explanations

    mentions of guns and gun-related terminology

    New Auto-Interp
    Negative Logits
    еÑĢин
    -0.17
    oha
    -0.17
    ©
    -0.16
    acus
    -0.15
    cue
    -0.15
    oped
    -0.15
    hra
    -0.14
    casts
    -0.14
    tick
    -0.14
    ÑĢажд
    -0.14
    POSITIVE LOGITS
    pow
    0.29
    ning
    0.22
    ned
    0.20
    linger
    0.20
    ny
    0.20
    metal
    0.20
    shots
    0.19
    erals
    0.18
    boat
    0.17
    nen
    0.17
    Act Density 0.023%

    No Known Activations