INDEX
    Explanations

    instances of words describing violence or brutality

    New Auto-Interp
    Negative Logits
    retty
    -0.17
    onya
    -0.16
    _GUID
    -0.14
     Gardens
    -0.14
    IMIT
    -0.14
    ihat
    -0.14
    itel
    -0.14
    iye
    -0.14
    cip
    -0.14
     Dess
    -0.14
    POSITIVE LOGITS
    anc
    0.15
    lemek
    0.14
     evil
    0.14
    ÄĻk
    0.14
     fleet
    0.14
    agon
    0.14
    APE
    0.14
     storm
    0.14
     prim
    0.14
    Allocator
    0.14
    Act Density 0.001%

    No Known Activations