INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Доп
    -0.08
     пут
    -0.08
    806
    -0.07
     Curry
    -0.07
     CRO
    -0.07
    mc
    -0.07
    iyim
    -0.07
    agr
    -0.07
    idente
    -0.07
    aside
    -0.07
    POSITIVE LOGITS
     entier
    0.08
     jointly
    0.07
     bishops
    0.07
     Azure
    0.07
     Carol
    0.07
     hul
    0.07
     fath
    0.07
     pubblico
    0.07
    Harmony
    0.07
     ow
    0.07
    Act Density 0.004%

    No Known Activations