INDEX
    Explanations

    phrases that bring attention to social and political issues

    New Auto-Interp
    Negative Logits
    --+
    -0.78
    fork
    -0.77
    ée
    -0.72
    Iterator
    -0.69
    NING
    -0.69
    beam
    -0.69
    tails
    -0.67
    andowski
    -0.66
    ulla
    -0.66
    pour
    -0.66
    POSITIVE LOGITS
     injust
    1.02
     dangers
    0.92
     shortcomings
    0.88
     misogyny
    0.88
     atrocities
    0.87
     wrongdoing
    0.87
     homosexuality
    0.87
     abuses
    0.87
     issues
    0.86
     sexism
    0.83
    Act Density 0.119%

    No Known Activations