INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ();
    -0.07
     your
    -0.07
     ----------
    -0.06
     нет
    -0.06
     remote
    -0.06
     major
    -0.06
     player
    -0.06
    (ur
    -0.06
     online
    -0.06
    vw
    -0.06
    POSITIVE LOGITS
    aint
    0.06
     кар
    0.06
    IAM
    0.06
    occupied
    0.06
    eft
    0.06
     Bast
    0.06
     آب
    0.06
     ML
    0.06
    <Model
    0.06
     Yelp
    0.06
    Act Density 0.003%

    No Known Activations