INDEX
    Explanations

    references to hate crimes and related terminology

    New Auto-Interp
    Negative Logits
    cken
    -0.17
    nothrow
    -0.15
    :animated
    -0.14
    annon
    -0.14
    ãĥ³ãĤ¹
    -0.14
    yna
    -0.14
    arters
    -0.13
    iland
    -0.13
    .dp
    -0.13
    â̦â̦ãĢĤ
    -0.13
    POSITIVE LOGITS
    oldt
    0.15
    izia
    0.15
    ot
    0.14
     UNS
    0.14
    kee
    0.14
    avenport
    0.14
    sons
    0.14
     oste
    0.14
    .glide
    0.14
    ëľ
    0.13
    Act Density 0.018%

    No Known Activations