INDEX
    Explanations

    phrases related to serious or harmful actions

    terms related to severe harm or injury

    New Auto-Interp
    Negative Logits
    XL
    -0.72
     Elves
    -0.72
    fix
    -0.71
    den
    -0.71
    wallet
    -0.68
    girl
    -0.67
     Diver
    -0.66
    gamer
    -0.66
    cloth
    -0.65
    starter
    -0.65
    POSITIVE LOGITS
    ous
    1.20
    ously
    1.19
    ising
    1.18
    icates
    1.09
    ues
    1.08
    ized
    1.08
    izations
    1.07
    icable
    1.06
    izing
    1.06
    istic
    1.05
    Act Density 0.046%

    No Known Activations