INDEX
    Explanations

    racial bias

    This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.

    Explanation could not be parsed.

    Explanation could not be parsed.

    New Auto-Interp
    Negative Logits
     leaf
    -0.08
     losse
    -0.08
     folos
    -0.08
     postfix
    -0.07
    roy
    -0.07
     borrowed
    -0.07
     topo
    -0.07
     אית
    -0.07
     overw
    -0.07
    ויה
    -0.07
    POSITIVE LOGITS
     racial
    0.20
    racial
    0.17
    黑人
    0.16
     racism
    0.16
     ethnic
    0.16
     ethnicity
    0.15
     racist
    0.15
     minorities
    0.14
     multicultural
    0.14
     LGBTQ
    0.14
    Act Density 0.291%

    No Known Activations