INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     by
    0.91
    يه
    0.87
     for
    0.85
    ње
    0.84
     ،
    0.82
    0.79
    $,
    0.77
    ،
    0.76
    ی
    0.74
     a
    0.73
    POSITIVE LOGITS
    an
    1.44
    ان
    1.42
    t
    1.28
    n
    1.16
    r
    1.13
    soever
    1.01
    on
    0.98
    how
    0.95
    p
    0.94
    that
    0.93
    Act Density 0.638%

    No Known Activations