INDEX
    Explanations

    references to murder and related violent acts

    New Auto-Interp
    Negative Logits
    nie
    -0.17
     Thief
    -0.16
    ваÑĢ
    -0.15
    imate
    -0.15
    thora
    -0.15
    htt
    -0.14
    /buttons
    -0.14
    stract
    -0.14
    IMUM
    -0.14
    _corners
    -0.14
    POSITIVE LOGITS
    ously
    0.32
    abilia
    0.25
    esses
    0.25
    ous
    0.25
    -su
    0.23
     mystery
    0.21
    ess
    0.21
    OUS
    0.20
     attempt
    0.18
     plot
    0.18
    Act Density 0.023%

    No Known Activations