INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     P
    0.51
     crim
    0.47
     criminally
    0.46
    con
    0.43
    ,
    0.43
     as
    0.42
     over
    0.42
     g
    0.42
     crime
    0.42
     criminal
    0.41
    POSITIVE LOGITS
    irrahim
    0.50
    0.50
     وهذه
    0.48
    0.45
     yanlı
    0.41
    యో
    0.41
    ума
    0.41
    ванию
    0.41
    0.41
     இவை
    0.40
    Act Density 0.004%

    No Known Activations