INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𝚔
    0.69
     عدم
    0.64
     Zustimmung
    0.64
    aloko
    0.63
     INVEST
    0.63
     القسمه
    0.63
    𝐚
    0.63
     privés
    0.62
     সুবিধা
    0.61
     ب
    0.61
    POSITIVE LOGITS
    ucius
    0.59
    르는
    0.56
    iae
    0.55
    0.55
    an
    0.55
    ia
    0.54
    But
    0.54
     But
    0.54
    de
    0.53
    rii
    0.52
    Act Density 0.001%

    No Known Activations