INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     moved
    -0.06
     له
    -0.06
     preca
    -0.06
     Parkway
    -0.06
    	counter
    -0.06
     школи
    -0.06
     Cont
    -0.06
    [l
    -0.06
     extremes
    -0.06
    [res
    -0.06
    POSITIVE LOGITS
    idepress
    0.07
    đ
    0.07
    wort
    0.06
    0.06
     açık
    0.06
     comprise
    0.06
     tub
    0.06
    tv
    0.06
     cuck
    0.06
    _puts
    0.06
    Act Density 0.040%

    No Known Activations