INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bullied
    -0.07
    otions
    -0.07
    urt
    -0.07
    eceğini
    -0.07
    ladık
    -0.07
    	pw
    -0.06
     festival
    -0.06
    avers
    -0.06
    itional
    -0.06
    �ng
    -0.06
    POSITIVE LOGITS
     имеют
    0.12
     имеет
    0.10
     име
    0.09
     мають
    0.08
    Exports
    0.07
    име
    0.07
    ấm
    0.07
     MSE
    0.06
    μμα
    0.06
    经济
    0.06
    Act Density 0.005%

    No Known Activations