INDEX
    Explanations

    references to the black community

    New Auto-Interp
    Negative Logits
    fram
    -0.16
     italian
    -0.14
    FRING
    -0.14
    gypt
    -0.14
    CKET
    -0.14
    cona
    -0.13
    ignal
    -0.13
    ardo
    -0.13
    Ùī
    -0.13
    Ø·ÙĦ
    -0.13
    POSITIVE LOGITS
     Western
    0.16
    ornings
    0.16
    olon
    0.15
     proudly
    0.14
     Bik
    0.14
    -cols
    0.14
    0.14
     tens
    0.13
    ))))
    0.13
    uffles
    0.13
    Act Density 0.000%

    No Known Activations