INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    1.00
     a
    0.91
    ר
    0.89
     y
    0.89
     e
    0.88
    the
    0.88
     i
    0.86
     o
    0.86
     in
    0.85
    ا
    0.85
    POSITIVE LOGITS
    '
    1.16
    of
    0.99
     of
    0.95
    0.88
    <unused2231>
    0.86
     của
    0.85
    ного
    0.84
     của
    0.81
    Of
    0.79
    j
    0.79
    Act Density 1.378%

    No Known Activations