INDEX
    Explanations

    references to evil and malevolent actions or characters

    New Auto-Interp
    Negative Logits
    etch
    -0.17
    acles
    -0.16
    icle
    -0.16
     пок
    -0.16
    ingly
    -0.16
    еÑĩ
    -0.15
    ĤŃ
    -0.15
    ech
    -0.15
    olina
    -0.15
    LIGHT
    -0.14
    POSITIVE LOGITS
    ness
    0.18
    ution
    0.18
     deeds
    0.17
    -do
    0.16
    lest
    0.16
     intent
    0.15
    UTION
    0.15
    ulence
    0.15
    nature
    0.15
     Bunny
    0.15
    Act Density 0.024%

    No Known Activations