INDEX
    Explanations

    references to murder and related violent acts

    New Auto-Interp
    Negative Logits
    ваÑĢ
    -0.17
     Thief
    -0.15
    lage
    -0.15
    htt
    -0.14
    _corners
    -0.14
    nie
    -0.14
    906
    -0.14
    unes
    -0.14
    imate
    -0.14
    eward
    -0.14
    POSITIVE LOGITS
    ously
    0.33
    ous
    0.28
    abilia
    0.25
    esses
    0.23
    -su
    0.21
     mystery
    0.20
     spree
    0.20
    OUS
    0.20
    pedia
    0.18
     scene
    0.18
    Act Density 0.016%

    No Known Activations