INDEX
    Explanations

    comparisons

    New Auto-Interp
    Negative Logits
    ść
    -0.07
     Paint
    -0.07
     perfume
    -0.06
     собою
    -0.06
     Peru
    -0.06
    ันน
    -0.06
     سیستم
    -0.06
     عن
    -0.06
    írk
    -0.06
     microphone
    -0.06
    POSITIVE LOGITS
    lamak
    0.07
     honorable
    0.07
    divid
    0.07
    liga
    0.07
     Independ
    0.06
    0.06
    0.06
    ALLERY
    0.06
     ослож
    0.06
    _extractor
    0.06
    Act Density 0.069%

    No Known Activations