INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bert
    -0.08
    abouts
    -0.08
    רח
    -0.07
     recreation
    -0.07
     potrz
    -0.07
     Delphi
    -0.07
    Lore
    -0.07
     социальных
    -0.07
    \Helper
    -0.07
    Del
    -0.07
    POSITIVE LOGITS
    itang
    0.08
    toe
    0.08
     tóc
    0.08
     Toe
    0.08
     thêm
    0.08
     oily
    0.08
     Southern
    0.08
    ottest
    0.08
     nark
    0.08
    /link
    0.07
    Act Density 0.000%

    No Known Activations