INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    كات
    -0.07
     stated
    -0.07
    oples
    -0.07
     уник
    -0.06
     ότι
    -0.06
     Ста
    -0.06
     яка
    -0.06
    -0.06
     spanking
    -0.06
     spat
    -0.06
    POSITIVE LOGITS
    	ev
    0.07
     uns
    0.07
    Ye
    0.07
     brilliant
    0.07
     intelligent
    0.07
    !!↵
    0.06
    %"↵
    0.06
     Finnish
    0.06
     noch
    0.06
     dismiss
    0.06
    Act Density 0.090%

    No Known Activations