INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    -0.96
    Geplaatst
    -0.94
     فريبيس
    -0.77
     asupra
    -0.76
     étoient
    -0.76
     itſelf
    -0.76
     faſt
    -0.76
     Italijani
    -0.75
    sproz
    -0.75
    atguigu
    -0.74
    POSITIVE LOGITS
    QName
    0.50
    bo
    0.46
    batik
    0.45
    -
    0.44
     these
    0.44
     the
    0.43
    cle
    0.42
    AndEndTag
    0.42
     behalf
    0.41
     mati
    0.41
    Act Density 1.580%

    No Known Activations