INDEX
    Explanations

    phrases related to injustice or accountability in societal issues

    New Auto-Interp
    Negative Logits
    bih
    -0.14
     Plenty
    -0.14
    oons
    -0.13
    arin
    -0.13
    uet
    -0.13
    922
    -0.13
    oS
    -0.13
    iere
    -0.13
     Independ
    -0.13
    oten
    -0.13
    POSITIVE LOGITS
     escape
    0.34
     escapes
    0.32
     escaping
    0.32
     escaped
    0.30
    escape
    0.29
     Escape
    0.27
    Escape
    0.26
     immunity
    0.26
     impunity
    0.25
    escaped
    0.25
    Act Density 0.085%

    No Known Activations