INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (menu
    -0.08
    rejected
    -0.06
    Segoe
    -0.06
     judged
    -0.06
    ‌ی
    -0.06
    -0.06
    Tokens
    -0.06
    !(↵
    -0.06
    yg
    -0.06
     мають
    -0.06
    POSITIVE LOGITS
     Пра
    0.07
    _DEFINED
    0.07
     جستارهای
    0.06
     ngOn
    0.06
    _Map
    0.06
    ,nil
    0.06
     glyph
    0.06
     HK
    0.06
    _paper
    0.06
    0.06
    Act Density 0.174%

    No Known Activations