INDEX
    Explanations

    keywords related to equality and discrimination, particularly focusing on identity aspects like race, color, and sexual orientation

    New Auto-Interp
    Negative Logits
    <eos>
    -0.67
     in
    -0.64
    ↵↵
    -0.63
    hyrchwyd
    -0.60
    .
    -0.59
    -0.56
     (
    -0.56
     is
    -0.54
    ])));
    -0.52
     so
    -0.50
    POSITIVE LOGITS
     kasarigan
    1.13
     itſelf
    1.00
    protoimpl
    0.99
     חיצוניים
    0.98
     CURIAM
    0.97
     Мексичка
    0.97
     שוליים
    0.95
     Administrativna
    0.90
     kaynağından
    0.88
    ^(@)
    0.87
    Act Density 0.433%

    No Known Activations