INDEX
    Explanations

    mentions of firearms or weapons, particularly focusing on guns

    New Auto-Interp
    Negative Logits
    grad
    -0.72
    Fair
    -0.69
    Attempts
    -0.66
    lihood
    -0.65
    spect
    -0.64
    UTE
    -0.64
    Work
    -0.64
    Benef
    -0.63
    Solution
    -0.62
    eff
    -0.61
    POSITIVE LOGITS
    linger
    1.34
     blazing
    1.21
    hips
    1.15
     guns
    1.12
    mith
    1.09
    powder
    1.06
    poons
    0.97
    hops
    0.96
    hooting
    0.95
    hip
    0.93
    Act Density 0.016%

    No Known Activations