INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ิน
    -0.09
     নি�
    -0.08
    ensos
    -0.08
     totaling
    -0.08
     آب
    -0.07
     उड़
    -0.07
    holz
    -0.07
     paura
    -0.07
    verg
    -0.07
    arbeit
    -0.07
    POSITIVE LOGITS
     Kas
    0.08
     parties
    0.08
    0.07
     diss
    0.07
     Diss
    0.07
    represented
    0.07
     Dion
    0.07
    τύ
    0.07
     Represent
    0.07
    çado
    0.07
    Act Density 0.008%

    No Known Activations