INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    1.13
     r
    1.03
    ı
    1.03
    dır
    1.00
    א
    0.99
     hallway
    0.88
    اً
    0.87
     mandarin
    0.87
    ite
    0.86
    ır
    0.85
    POSITIVE LOGITS
    1.47
    is
    1.21
    1.20
    as
    1.15
     atuais
    1.11
    یا
    1.09
    forEach
    1.09
    m
    1.06
    ۰
    1.06
     titulares
    1.04
    Act Density 0.022%

    No Known Activations