INDEX
    Explanations

    rich, requires, increased, quantitative

    New Auto-Interp
    Negative Logits
    {
    0.79
    I
    0.78
     ذلك
    0.75
     of
    0.74
    Aby
    0.72
     ب
    0.71
    (
    0.71
     وأ
    0.71
    StartZ
    0.71
    0.69
    POSITIVE LOGITS
    t
    1.16
    л
    1.10
    ä
    1.03
    ą
    0.97
    0.96
    il
    0.95
    a
    0.95
    ı
    0.94
    is
    0.92
    f
    0.91
    Act Density 0.077%

    No Known Activations