INDEX
    Explanations

    The neuron activates on occurrences of specific racial group labels in demographic data (e.g. "White," "African American," etc.).

    New Auto-Interp
    Negative Logits
     Kardash
    -0.08
    Demand
    -0.07
    -0.06
    šov
    -0.06
     dul
    -0.06
    amac
    -0.06
    .onerror
    -0.06
     bdsm
    -0.06
    PTY
    -0.06
    AxisSize
    -0.06
    POSITIVE LOGITS
    rk
    0.06
     цвета
    0.06
    (connection
    0.06
     پیش
    0.06
    ti
    0.06
    mia
    0.06
    0.06
     neighborhood
    0.06
    -LAST
    0.06
    (token
    0.06
    Act Density 0.001%

    No Known Activations