INDEX
    Explanations

    references to guns and gun-related terminology

    New Auto-Interp
    Negative Logits
    еÑĢин
    -0.18
    tte
    -0.16
    cue
    -0.16
    hid
    -0.15
    een
    -0.15
    crast
    -0.15
    cin
    -0.15
    hra
    -0.15
    gor
    -0.15
    AsStream
    -0.15
    POSITIVE LOGITS
    pow
    0.35
    metal
    0.27
    ned
    0.25
    ning
    0.25
    ny
    0.24
    ners
    0.24
    shots
    0.24
    ner
    0.23
    boat
    0.22
    fight
    0.22
    Act Density 0.024%

    No Known Activations