INDEX
    Explanations

    references to hate crimes and violent offenses

    New Auto-Interp
    Negative Logits
    imli
    -0.16
    ghan
    -0.16
     Pitch
    -0.16
    agrant
    -0.15
    MMdd
    -0.15
    .pitch
    -0.15
    Escort
    -0.15
    .sg
    -0.14
    رÙĪØ¬
    -0.14
    ugs
    -0.14
    POSITIVE LOGITS
    仲
    0.16
    ç͍åĵģ
    0.16
    .ReadString
    0.14
     contr
    0.14
    åĩĮ
    0.14
    941
    0.13
     Messenger
    0.13
    Desk
    0.13
    iband
    0.13
    USH
    0.13
    Act Density 0.005%

    No Known Activations