INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     confiance
    -0.09
     EPL
    -0.08
     espec
    -0.08
     Fay
    -0.08
    dto
    -0.08
     eph
    -0.08
     EFI
    -0.08
     negoci
    -0.08
     Beijing
    -0.08
     estime
    -0.08
    POSITIVE LOGITS
     pesky
    0.09
     unconventional
    0.09
     quirky
    0.09
    强调
    0.08
     vermijden
    0.08
     edgy
    0.08
    另类
    0.08
     banned
    0.08
     prohibition
    0.08
     weird
    0.08
    Act Density 0.006%

    No Known Activations