INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fect
    -0.07
     узн
    -0.07
     mod
    -0.06
     defect
    -0.06
    _scaled
    -0.06
    ']],
    -0.06
     Ember
    -0.06
    -0.06
     noqa
    -0.06
     شف
    -0.06
    POSITIVE LOGITS
    roke
    0.07
    ськ
    0.07
    .team
    0.06
     strategist
    0.06
    TM
    0.06
    _minute
    0.06
    {@
    0.06
    dy
    0.06
    ernet
    0.06
    :request
    0.06
    Act Density 0.004%

    No Known Activations