INDEX
    Explanations

    introductions and definitions

    New Auto-Interp
    Negative Logits
    𝘾
    1.09
    as
    1.08
    ist
    1.05
    𝗯
    0.94
    𝗕
    0.91
    PRO
    0.89
    CXX
    0.88
    из
    0.86
    miš
    0.86
    elif
    0.85
    POSITIVE LOGITS
     bagaimana
    1.06
     muut
    0.96
     estão
    0.95
    .
    0.93
     arred
    0.92
     cei
    0.91
     এসে
    0.91
     announcer
    0.91
     gibt
    0.90
     afficher
    0.90
    Act Density 0.222%

    No Known Activations