INDEX
    Explanations

    your followed by description

    New Auto-Interp
    Negative Logits
    י
    2.58
    2.41
    й
    2.27
    dır
    2.19
    yya
    2.03
    ه
    1.95
    ي
    1.92
    𝘦
    1.92
    𝘵
    1.86
    ות
    1.85
    POSITIVE LOGITS
    ri
    2.25
     Lordships
    2.13
    n
    2.09
    ent
    2.03
    },
    1.99
    ling
    1.96
    MA
    1.96
    ir
    1.93
    ra
    1.93
    ur
    1.91
    Act Density 0.251%

    No Known Activations