INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    1.59
    ر
    1.58
    p
    1.51
    ur
    1.44
    ut
    1.41
    ad
    1.37
    r
    1.34
    woven
    1.29
    ور
    1.27
    ح
    1.21
    POSITIVE LOGITS
     on
    1.28
    1.12
     lanz
    1.10
    1.05
     făcut
    1.02
     estern
    1.00
    1.00
     bilo
    0.96
    ו
    0.96
     an
    0.95
    Act Density 0.000%

    No Known Activations