INDEX
    Explanations

    mentions of infectious diseases, particularly HIV/AIDS and hepatitis

    New Auto-Interp
    Negative Logits
    enes
    -0.16
     WHATSOEVER
    -0.15
    zej
    -0.15
    ORA
    -0.14
    Ù쨧ÙĦ
    -0.14
    loh
    -0.14
    ÙIJÙĨ
    -0.14
    coder
    -0.14
    essel
    -0.14
    oir
    -0.13
    POSITIVE LOGITS
    /AIDS
    0.20
    öst
    0.14
    -Compatible
    0.14
     Flood
    0.14
    Animated
    0.14
    braco
    0.14
    edImage
    0.14
    onde
    0.14
     super
    0.14
    459
    0.13
    Act Density 0.013%

    No Known Activations