INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    راض
    1.97
    ament
    1.91
    rain
    1.68
    ریض
    1.64
    ंपरा
    1.64
    enche
    1.63
    istice
    1.62
    𝚘
    1.60
    𝚛
    1.58
    1.57
    POSITIVE LOGITS
    י
    3.01
    s
    2.80
    ところに
    2.51
    ি
    2.45
    sley
    2.41
    ARKS
    2.40
    erful
    2.38
    Rounded
    2.30
    som
    2.29
    加坡
    2.28
    Act Density 1.844%

    No Known Activations