INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    я
    1.73
    o
    1.70
    ar
    1.59
    د
    1.51
    es
    1.47
    it
    1.45
    ling
    1.37
    ag
    1.36
    ic
    1.31
    ens
    1.27
    POSITIVE LOGITS
    1.32
    ্লাহ
    1.26
     encour
    1.23
    ε
    1.20
    1.20
    𝓑
    1.18
    𝒮
    1.16
     ﺍﻟ
    1.13
    1.12
     condemn
    1.11
    Act Density 0.113%

    No Known Activations