INDEX
    Explanations

    words related to societal concerns and safety issues

    New Auto-Interp
    Negative Logits
    awah
    -0.16
    .Glide
    -0.15
    ehler
    -0.15
    arias
    -0.15
     disag
    -0.15
    lage
    -0.14
     Animalia
    -0.14
    avig
    -0.14
    /REC
    -0.14
    mada
    -0.14
    POSITIVE LOGITS
    iscard
    0.15
    igy
    0.15
     T
    0.14
    ince
    0.14
     Zar
    0.14
     DLC
    0.14
     BAT
    0.14
    iÄĩ
    0.14
    azz
    0.14
     Gust
    0.14
    Act Density 0.022%

    No Known Activations