INDEX
    Explanations

    texts related to crimes, particularly those motivated by hate or aggression

    New Auto-Interp
    Negative Logits
    burgh
    -0.08
    å¥Ķ
    -0.07
    ılÄ±ÅŁ
    -0.07
    ampo
    -0.07
    nett
    -0.07
    yre
    -0.07
    crets
    -0.07
    olec
    -0.06
    rrha
    -0.06
    åīĽ
    -0.06
    POSITIVE LOGITS
    asco
    0.07
    udy
    0.06
     Stocks
    0.06
    970
    0.06
     deferred
    0.06
    defer
    0.06
    翼
    0.06
    ýš
    0.06
    _ptr
    0.05
     XP
    0.05
    Act Density 0.001%

    No Known Activations