INDEX
    Explanations

    terms related to malevolence or wickedness

    New Auto-Interp
    Negative Logits
    ylül
    -0.17
    posable
    -0.17
    eron
    -0.17
    laz
    -0.16
    лаж
    -0.16
    ÅĻeba
    -0.15
    egis
    -0.14
    tle
    -0.14
    çͲ
    -0.14
    ussen
    -0.14
    POSITIVE LOGITS
    ution
    0.29
     deeds
    0.26
    -do
    0.26
    ness
    0.25
     intent
    0.23
     intentions
    0.21
    ulence
    0.20
    intent
    0.20
     deed
    0.20
     intents
    0.20
    Act Density 0.026%

    No Known Activations