INDEX
    Explanations

    instances of violence or gun-related actions

    New Auto-Interp
    Negative Logits
    LOAT
    -0.16
    endir
    -0.16
     cru
    -0.15
     Pipes
    -0.15
    agg
    -0.15
    ãģ£ãģı
    -0.15
    UnderTest
    -0.14
    igated
    -0.14
    533
    -0.14
    uctor
    -0.14
    POSITIVE LOGITS
    aub
    0.17
    obus
    0.15
    meli
    0.14
    AZY
    0.14
     salv
    0.14
    nist
    0.13
    BG
    0.13
    MG
    0.13
    utters
    0.13
    az
    0.13
    Act Density 0.068%

    No Known Activations