INDEX
    Explanations

    references to political and social issues related to race and identity

    New Auto-Interp
    Negative Logits
    ilyn
    -0.16
    moon
    -0.16
    iversit
    -0.15
    سط
    -0.15
     æħ
    -0.15
    âĻĢ
    -0.14
    è£½ä½ľ
    -0.14
    rtle
    -0.14
    mist
    -0.14
    èĨľ
    -0.14
    POSITIVE LOGITS
    jÃŃ
    0.18
    ourg
    0.18
    rog
    0.16
    CommandLine
    0.15
    Atlas
    0.15
     Guinness
    0.14
     Primer
    0.14
    ún
    0.14
    ares
    0.14
    ÅĻ
    0.14
    Act Density 0.185%

    No Known Activations