INDEX
    Explanations

    words and phrases related to dehumanization or related concepts

    New Auto-Interp
    Negative Logits
    дов
    -0.16
    yt
    -0.16
    mesinin
    -0.15
    odi
    -0.14
    378
    -0.14
    fried
    -0.14
    led
    -0.14
    levant
    -0.14
    fern
    -0.14
    ium
    -0.14
    POSITIVE LOGITS
    ัà¸ģà¸Ĺ
    0.15
     rig
    0.15
    kart
    0.15
     de
    0.15
    #ad
    0.15
    AndWait
    0.14
    绣
    0.14
    asket
    0.14
    atego
    0.14
     Urg
    0.14
    Act Density 0.041%

    No Known Activations