INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Č
    -0.08
    innik
    -0.08
    clin
    -0.08
     flagged
    -0.07
     begleiten
    -0.07
    -step
    -0.07
    003
    -0.07
    146
    -0.07
    206
    -0.07
    002
    -0.07
    POSITIVE LOGITS
     asupra
    0.08
    Mur
    0.08
     kanya
    0.07
     wrought
    0.07
    0.07
     telo
    0.07
     аг
    0.07
     Thou
    0.07
     нее
    0.07
     propi
    0.07
    Act Density 0.053%

    No Known Activations