INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.08
    ها
    1.07
    ul
    1.03
    ad
    1.01
    ًا
    1.01
    ا۔
    1.01
    ן
    0.96
    in
    0.93
    ர்
    0.93
    ında
    0.89
    POSITIVE LOGITS
    '
    1.63
    -
    1.36
    1.17
    }
    1.09
    _
    1.07
    "
    0.96
    *
    0.93
    ve
    0.89
    been
    0.88
    }$
    0.88
    Act Density 0.001%

    No Known Activations