INDEX
    Explanations

    topics related to sexism and racism in society

    New Auto-Interp
    Negative Logits
     Bers
    -0.16
    _rq
    -0.16
    ocker
    -0.16
    izen
    -0.15
    ogue
    -0.15
    liers
    -0.14
    İ·
    -0.14
    ifestyles
    -0.13
    apel
    -0.13
     UIWindow
    -0.13
    POSITIVE LOGITS
     towards
    0.41
     toward
    0.38
     against
    0.36
    against
    0.31
     Against
    0.30
     Towards
    0.29
    Towards
    0.28
    Against
    0.27
     Tow
    0.27
    åIJij
    0.25
    Act Density 0.136%

    No Known Activations