INDEX
    Explanations

    phrases related to negative or harmful actions or characteristics

    terms related to negative or harmful actions and sentiments

    New Auto-Interp
    Negative Logits
    inet
    -0.94
    ais
    -0.85
    inances
    -0.81
    rozen
    -0.80
    ered
    -0.79
    orius
    -0.79
    liner
    -0.78
    alist
    -0.78
    lique
    -0.77
    arb
    -0.75
    POSITIVE LOGITS
     nasty
    0.87
     smear
    0.82
    terday
    0.79
     surprises
    0.75
     poisons
    0.73
     poison
    0.72
    soever
    0.70
     mud
    0.69
     hello
    0.69
     slander
    0.66
    Act Density 0.032%

    No Known Activations