INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ت
    1.34
    ک
    1.08
    ان
    1.05
    the
    1.05
    1.05
    a
    0.99
    ث
    0.97
    0.96
    ه
    0.96
    0.96
    POSITIVE LOGITS
     pest
    1.03
     τα
    0.83
     leider
    0.80
     την
    0.79
     subir
    0.79
     tony
    0.78
     sabi
    0.77
     beit
    0.77
    {
    0.77
     η
    0.76
    Act Density 0.001%

    No Known Activations