INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    creates
    -0.08
     PM
    -0.07
    adjust
    -0.07
    members
    -0.07
    -0.07
     Peggy
    -0.07
    Coll
    -0.07
    pygame
    -0.06
    .exists
    -0.06
    .ge
    -0.06
    POSITIVE LOGITS
    0.07
     beverage
    0.07
     Configuration
    0.06
     sahip
    0.06
     orientation
    0.06
     SAFE
    0.06
    'am
    0.06
     стратег
    0.06
     Dtype
    0.06
     вов
    0.05
    Act Density 0.029%

    No Known Activations