INDEX
    Explanations

    language related to activism and social justice

    New Auto-Interp
    Negative Logits
     shave
    -0.15
     Leakage
    -0.14
    rophy
    -0.14
    essim
    -0.14
    ossal
    -0.14
    anship
    -0.13
    Risk
    -0.13
    é£İéĻ©
    -0.13
    azar
    -0.13
    ennen
    -0.12
    POSITIVE LOGITS
     equality
    0.34
     equal
    0.34
     justice
    0.34
     equity
    0.31
     rights
    0.30
     fairness
    0.28
     fair
    0.28
     liberties
    0.25
     human
    0.25
    equal
    0.25
    Act Density 0.191%

    No Known Activations