INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fortune
    -0.06
    -0.06
     Hồng
    -0.06
    ورية
    -0.06
    ीन
    -0.06
    jist
    -0.06
    ptr
    -0.06
     Мон
    -0.06
    -0.06
     Importance
    -0.06
    POSITIVE LOGITS
    ='
    0.07
     toggle
    0.06
     кост
    0.06
     surgeon
    0.06
    _groups
    0.06
     naive
    0.06
     Factor
    0.06
    .optional
    0.06
     ISSN
    0.06
     spas
    0.06
    Act Density 0.015%

    No Known Activations