INDEX
    Explanations

    insulting or derogatory terms to describe people

    harsh insults and derogatory language

    New Auto-Interp
    Negative Logits
    imester
    -0.87
    AMS
    -0.80
    ItemImage
    -0.78
    istry
    -0.77
    conom
    -0.75
    isine
    -0.73
    eton
    -0.69
    ISTORY
    -0.69
    anship
    -0.68
     readiness
    -0.67
    POSITIVE LOGITS
     bunny
    1.26
     bastard
    1.25
     puppy
    1.22
     guy
    1.21
     dude
    1.20
     monkey
    1.19
     bitch
    1.17
     gorilla
    1.17
     ape
    1.17
     jerk
    1.17
    Act Density 0.664%

    No Known Activations