INDEX
    Explanations

    references to social and historical issues, particularly those involving race and systemic injustices

    New Auto-Interp
    Negative Logits
    üss
    -0.14
    responses
    -0.13
    endale
    -0.13
     íĨµíķ´
    -0.13
     Russell
    -0.13
    empo
    -0.13
     Elle
    -0.13
    inin
    -0.13
    oda
    -0.13
     Approach
    -0.13
    POSITIVE LOGITS
    -themed
    0.38
    -related
    0.33
     themed
    0.30
    -focused
    0.28
    related
    0.24
     related
    0.23
    .related
    0.23
    -theme
    0.22
    _related
    0.22
    ê´Ģ볨
    0.21
    Act Density 0.421%

    No Known Activations