INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    )^{[\
    0.37
     Ху
    0.36
    trad
    0.35
     Hik
    0.35
    gut
    0.35
     Mathematik
    0.35
    ogados
    0.34
    សម្រាប់
    0.34
     trad
    0.33
    📕
    0.33
    POSITIVE LOGITS
     gw
    0.84
     GW
    0.83
     Gw
    0.77
    Gw
    0.76
    GW
    0.73
    gw
    0.67
     Gew
    0.65
    Gew
    0.64
     gew
    0.53
     Gwen
    0.48
    Act Density 0.002%

    No Known Activations