INDEX
    Explanations

    words related to physical assault or violence

    references to the word "mug" in various contexts

    New Auto-Interp
    Negative Logits
     BCE
    -0.77
    Phase
    -0.76
    RC
    -0.70
    phas
    -0.69
    COM
    -0.68
     EA
    -0.68
    CE
    -0.68
    urance
    -0.67
    loop
    -0.67
    ׾
    -0.66
    POSITIVE LOGITS
     mug
    4.13
     Mug
    2.26
     jug
    1.10
     fug
    1.10
     robber
    1.05
     robbed
    0.99
     thug
    0.98
     bust
    0.91
     wig
    0.90
     burg
    0.90
    Act Density 0.017%

    No Known Activations