INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ت
    1.02
     зрения
    0.97
    ాలు
    0.94
    s
    0.90
    ب
    0.90
     внимания
    0.88
     estím
    0.86
    <0x80>
    0.84
    ۔
    0.79
    تي
    0.79
    POSITIVE LOGITS
     as
    1.09
    ри
    1.02
    ه‌ها
    1.02
    0.98
    ची
    0.95
    ست
    0.95
    0.93
    ला
    0.91
    ни
    0.91
    ell
    0.88
    Act Density 0.143%

    No Known Activations