INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     negra
    -0.08
    (CG
    -0.08
     tac
    -0.07
     Fa
    -0.07
     irrev
    -0.07
     राहत
    -0.07
     عمر
    -0.07
    .ACT
    -0.07
     القر
    -0.07
    (drop
    -0.07
    POSITIVE LOGITS
    anni
    0.08
    Sher
    0.07
     Grie
    0.07
     небольшой
    0.07
     Kop
    0.07
     SOS
    0.07
     Fab
    0.07
    0.07
     ironically
    0.07
     servei
    0.07
    Act Density 0.003%

    No Known Activations