INDEX
    Explanations

    references to human trafficking and related labor abuses

    New Auto-Interp
    Negative Logits
    emens
    -0.17
    ahren
    -0.15
    isure
    -0.15
    raž
    -0.14
    åı
    -0.14
    anki
    -0.14
    awe
    -0.14
    Č
    -0.14
    ackbar
    -0.14
     Carroll
    -0.13
    POSITIVE LOGITS
     ter
    0.16
     chatt
    0.15
    bst
    0.14
    Äįem
    0.14
     PIT
    0.14
    warts
    0.14
     into
    0.14
    CKER
    0.13
    ÃŃm
    0.13
     pit
    0.13
    Act Density 0.019%

    No Known Activations