INDEX
    Explanations

    names or proper nouns related to political and community figures

    instances of proper nouns and specific names or titles

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.65
     negro
    -0.57
     QC
    -0.54
    :-
    -0.50
     FF
    -0.50
     myster
    -0.50
    ç«
    -0.50
    ©
    -0.49
     moot
    -0.49
     manifold
    -0.48
    POSITIVE LOGITS
     apologized
    0.63
    Enlarge
    0.58
    cohol
    0.55
    rouse
    0.54
    awaru
    0.54
    packing
    0.52
    "}],"
    0.52
     apologize
    0.51
    arnaev
    0.51
     Healthy
    0.51
    Act Density 1.078%

    No Known Activations