INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    0.38
    {
    0.37
    rie
    0.32
     {
    0.31
    ).
    0.31
    );
    0.30
    _)
    0.29
     (
    0.29
    ۰
    0.29
    که
    0.29
    POSITIVE LOGITS
    in
    0.47
    ر
    0.41
    an
    0.41
    ل
    0.40
    0.40
    ан
    0.39
    на
    0.39
    as
    0.39
    0.38
    ח
    0.38
    Act Density 0.703%

    No Known Activations