INDEX
    Explanations

    references to physical violence such as assault and mugging

    occurrences of the word "mug" and related references

    New Auto-Interp
    Negative Logits
    ISION
    -0.77
    edient
    -0.72
     Virgin
    -0.70
    Domin
    -0.69
    ×Ļ×
    -0.68
    ISE
    -0.68
    IGH
    -0.67
     Doctrine
    -0.66
    ipher
    -0.66
    cision
    -0.64
    POSITIVE LOGITS
     mug
    1.12
    gers
    1.11
    shots
    1.06
    ging
    1.02
    shot
    0.98
    ger
    0.96
    atures
    0.92
    ged
    0.88
    glers
    0.86
    gery
    0.84
    Act Density 0.007%

    No Known Activations