INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ong
    1.17
    েইলি
    1.02
    Ν
    0.92
     alleges
    0.91
     jeans
    0.89
    DEFGHIJKLMNOP
    0.89
    zám
    0.88
     absorbs
    0.88
     enclose
    0.86
    0.86
    POSITIVE LOGITS
    ا
    1.36
    ate
    1.17
    ق
    1.05
    𝗮
    1.02
    1.00
    おり
    1.00
    ان
    0.98
    0.98
    0.96
    Nicht
    0.95
    Act Density 0.208%

    No Known Activations