INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -1.45
    ַּ
    -1.23
    ֶּ
    -1.05
     تغي
    -1.01
    ровна
    -0.95
     ดอก
    -0.94
    ִּ
    -0.93
     lepiej
    -0.93
    Mann
    -0.91
    ָר
    -0.90
    POSITIVE LOGITS
     to
    1.26
    ׇ
    1.18
    !,
    1.05
     millón
    1.00
    Hilsen
    0.99
    batman
    0.96
     Институ
    0.96
     chré
    0.95
     tem
    0.93
     ç
    0.93
    Act Density 0.000%

    No Known Activations