INDEX
    Explanations

    lightweight

    New Auto-Interp
    Negative Logits
    _ENT
    -0.09
     Rhe
    -0.08
    _ent
    -0.07
     nive
    -0.07
     horn
    -0.07
    Ent
    -0.07
     mixed
    -0.07
     Ent
    -0.07
    ENTS
    -0.07
     meng
    -0.06
    POSITIVE LOGITS
    ened
    0.09
     chóng
    0.09
     للغاية
    0.08
     ఉండ
    0.08
    eners
    0.08
    &e
    0.08
    0.08
    elassen
    0.08
    /color
    0.08
     cuotas
    0.08
    Act Density 0.008%

    No Known Activations