INDEX
    Explanations

    insulting or critical phrases

    words related to negative criticisms and societal issues

    New Auto-Interp
    Negative Logits
    minster
    -0.67
    Newsletter
    -0.66
     Downs
    -0.64
    holm
    -0.62
    kaya
    -0.61
    Äĩ
    -0.60
    boro
    -0.60
    hover
    -0.60
    rav
    -0.60
    wake
    -0.59
    POSITIVE LOGITS
    lished
    0.70
    ciating
    0.63
     lett
    0.62
    ļéĨĴ
    0.61
     Geek
    0.57
     metic
    0.57
    essors
    0.57
     UNC
    0.53
     hybrids
    0.52
    estyles
    0.51
    Act Density 0.534%

    No Known Activations