INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     restriction
    -0.08
    alta
    -0.06
     Dum
    -0.06
    hair
    -0.06
     پرداز
    -0.06
    хи
    -0.06
    .Restr
    -0.06
    iba
    -0.06
    ετ
    -0.06
    ioned
    -0.06
    POSITIVE LOGITS
     fart
    0.06
     TLabel
    0.06
     errone
    0.06
    tournament
    0.06
     oppression
    0.06
    0.06
    .Commit
    0.06
     пад
    0.05
    ))↵
    0.05
     flashlight
    0.05
    Act Density 0.007%

    No Known Activations