INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     box
    -0.07
    GRESS
    -0.07
     harmed
    -0.06
     Congress
    -0.06
     جی
    -0.06
     attack
    -0.06
     aesthetic
    -0.06
     welcomed
    -0.06
     pace
    -0.06
    hospital
    -0.06
    POSITIVE LOGITS
     MCC
    0.07
    (PDO
    0.07
    MASConstraintMaker
    0.06
    )>↵
    0.06
    )),
    0.06
     BSD
    0.06
     redeemed
    0.06
    <src
    0.06
    ((__
    0.06
    ς
    0.06
    Act Density 0.003%

    No Known Activations