INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ↵            ↵
    -0.07
    atsu
    -0.07
    "Not
    -0.07
     Riding
    -0.07
     Translator
    -0.07
     difer
    -0.07
    bx
    -0.07
     Não
    -0.07
    coef
    -0.06
     PIL
    -0.06
    POSITIVE LOGITS
    -blind
    0.07
     merciless
    0.06
     bachelor
    0.06
     unzip
    0.06
     names
    0.06
     mohli
    0.06
     нами
    0.06
     interviewing
    0.06
     фон
    0.06
     recommending
    0.06
    Act Density 0.001%

    No Known Activations