INDEX
    Explanations

    patterns related to social issues, particularly those involving race and discrimination

    New Auto-Interp
    Negative Logits
     
    -0.15
    399
    -0.14
    ansk
    -0.14
     Harm
    -0.14
    inn
    -0.14
     harm
    -0.13
     hindsight
    -0.13
     steps
    -0.13
    âĢİ
    -0.13
    aska
    -0.13
    POSITIVE LOGITS
     aspect
    0.34
     idea
    0.30
     factor
    0.30
     phenomenon
    0.27
     issue
    0.27
     concept
    0.27
     principle
    0.26
    aspect
    0.26
     thing
    0.25
     angle
    0.25
    Act Density 0.337%

    No Known Activations