INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cola
    -0.07
    Week
    -0.07
     blatantly
    -0.06
    958
    -0.06
     دهه
    -0.06
    _ok
    -0.06
     haha
    -0.06
    pred
    -0.06
     باغ
    -0.06
    جز
    -0.06
    POSITIVE LOGITS
     yeterli
    0.07
     Hor
    0.06
     butcher
    0.06
     sonrası
    0.06
     Mant
    0.06
    liced
    0.06
    δε
    0.06
     tamp
    0.06
    /person
    0.06
    	work
    0.06
    Act Density 0.000%

    No Known Activations