INDEX
    Explanations

    code, rules, and comments

    New Auto-Interp
    Negative Logits
     сейчас
    -0.08
     gekregen
    -0.08
     развед
    -0.08
     bác
    -0.08
     entf
    -0.08
    oriasis
    -0.08
     согласно
    -0.08
    fft
    -0.08
    ückt
    -0.07
     uchar
    -0.07
    POSITIVE LOGITS
     Mascul
    0.09
     sprites
    0.07
     masculine
    0.07
    0.07
    ТП
    0.07
    েস
    0.07
     impur
    0.07
    .meg
    0.07
     mascul
    0.07
    TPL
    0.07
    Act Density 0.003%

    No Known Activations