INDEX
    Explanations

    terms associated with violence and brutality

    New Auto-Interp
    Negative Logits
    amd
    -0.17
    brush
    -0.16
    tra
    -0.16
    grund
    -0.16
     latter
    -0.16
     strav
    -0.15
     brute
    -0.14
     grace
    -0.14
    exion
    -0.14
    -thirds
    -0.14
    POSITIVE LOGITS
    shaw
    0.20
    以为
    0.19
    zeitig
    0.17
    ly
    0.17
    ulent
    0.16
    acht
    0.15
    ummer
    0.15
    bite
    0.15
    uffles
    0.15
    imal
    0.15
    Act Density 0.553%

    No Known Activations