INDEX
    Explanations

    mentions of LGBTQ+ related terms and discrimination issues

    New Auto-Interp
    Negative Logits
     corrom
    -0.54
     huma
    -0.49
     dépens
    -0.48
     engend
    -0.48
     peup
    -0.47
    CodedInputStream
    -0.46
     vulga
    -0.45
     dépass
    -0.45
     répand
    -0.45
     palab
    -0.44
    POSITIVE LOGITS
     trecut
    0.58
     dă
    0.54
     disambiguazione
    0.52
     Audiodateien
    0.51
     blest
    0.51
    SBATCH
    0.51
    üedad
    0.50
     împre
    0.49
    jectures
    0.49
     disqual
    0.49
    Act Density 0.394%

    No Known Activations