INDEX
    Explanations

    concepts related to social justice and equity issues, particularly around fairness and inequality

    New Auto-Interp
    Negative Logits
    away
    -0.18
    onas
    -0.17
    alem
    -0.16
    NotAllowed
    -0.15
     Learned
    -0.14
     spoof
    -0.14
    ιÏĩ
    -0.14
    ertia
    -0.14
     Ulus
    -0.14
    ippy
    -0.13
    POSITIVE LOGITS
     fail
    0.39
     fails
    0.38
     overlook
    0.37
     miss
    0.36
     misses
    0.36
     neglect
    0.33
    miss
    0.31
     ignore
    0.30
     ignores
    0.29
     masks
    0.28
    Act Density 0.312%

    No Known Activations