INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     başarı
    -0.07
     Reaction
    -0.06
     Komment
    -0.06
     लक
    -0.06
     Sar
    -0.06
    Congress
    -0.06
     Bei
    -0.06
     resale
    -0.06
    \controllers
    -0.06
     SL
    -0.06
    POSITIVE LOGITS
     وظ
    0.07
    ]!=
    0.06
    zs
    0.06
    .coordinates
    0.06
    0.06
     нег
    0.06
    (){}↵
    0.06
    skb
    0.06
    empt
    0.06
    ющих
    0.06
    Act Density 0.001%

    No Known Activations