INDEX
    Explanations

    phrases and concepts related to social dynamics and privilege

    New Auto-Interp
    Negative Logits
    thumbnail
    -0.17
    HEN
    -0.16
    ίκ
    -0.15
    hiba
    -0.15
    reon
    -0.15
    Haz
    -0.14
    oklyn
    -0.14
    áºŃy
    -0.14
    ALLENG
    -0.14
    groupBox
    -0.14
    POSITIVE LOGITS
    iner
    0.16
    dae
    0.15
    afd
    0.15
    ç»ıéªĮ
    0.14
     hol
    0.14
    inh
    0.14
    .glide
    0.14
    iná
    0.14
     reim
    0.14
    Std
    0.14
    Act Density 0.460%

    No Known Activations