INDEX
    Explanations

    references to vulnerable individuals or groups in various contexts

    New Auto-Interp
    Negative Logits
    tron
    -0.16
    nd
    -0.16
     Stateless
    -0.15
    ntl
    -0.15
    ually
    -0.14
    uel
    -0.14
    yen
    -0.14
    wang
    -0.13
    ารà¸ĸ
    -0.13
    aries
    -0.13
    POSITIVE LOGITS
     who
    0.16
     же
    0.16
    -ci
    0.15
    zelf
    0.14
    ύ
    0.14
     same
    0.14
     Marcus
    0.14
    оÑĢи
    0.13
    errat
    0.13
    dsn
    0.13
    Act Density 0.053%

    No Known Activations