INDEX
    Explanations

    sentences expressing societal criticism related to race and inequality

    New Auto-Interp
    Negative Logits
     Efq
    -1.15
    ſelves
    -1.14
     myſelf
    -1.14
    ſelf
    -1.09
     purpoſe
    -1.09
     Monfieur
    -1.07
     Theſe
    -0.99
     Jefus
    -0.98
     itſelf
    -0.98
     faſt
    -0.96
    POSITIVE LOGITS
     still
    0.65
    still
    0.59
    Still
    0.58
    ?
    0.54
     Still
    0.53
    !
    0.51
     ainda
    0.49
    ...
    0.48
     STILL
    0.48
     nevertheless
    0.47
    Act Density 0.138%

    No Known Activations