INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    л
    1.27
    ق
    1.26
    د
    1.24
    b
    1.23
    д
    1.23
    UT
    1.15
    ó
    1.13
    א
    1.11
    什么
    1.09
     on
    1.08
    POSITIVE LOGITS
     a
    1.41
    1.30
    मध्ये
    1.23
     основ
    1.21
    τα
    1.16
     в
    1.13
     macam
    1.13
     anos
    1.11
    1.09
     شما
    1.09
    Act Density 0.010%

    No Known Activations