INDEX
    Explanations

    references to explosive devices or bombs

    New Auto-Interp
    Negative Logits
    oldem
    -0.17
    yte
    -0.16
    ernals
    -0.16
    ful
    -0.15
    yt
    -0.14
    ofday
    -0.14
    ymous
    -0.14
    .Inf
    -0.14
    .GroupLayout
    -0.14
    INF
    -0.14
    POSITIVE LOGITS
    shell
    0.30
    arded
    0.27
    arding
    0.26
    ard
    0.22
    astic
    0.21
    astically
    0.20
     bomb
    0.19
    ination
    0.18
    (shell
    0.18
    remen
    0.18
    Act Density 0.010%

    No Known Activations