INDEX
    Explanations

    расстоя

    New Auto-Interp
    Negative Logits
     прод
    -0.08
     françaises
    -0.08
    _clause
    -0.07
     affirm
    -0.07
     woh
    -0.07
     Scott
    -0.07
    umble
    -0.07
     thể
    -0.07
     misc
    -0.07
    -0.07
    POSITIVE LOGITS
    lust
    0.08
     bero
    0.08
     tslint
    0.07
     nickel
    0.07
    likes
    0.07
     Ofic
    0.07
    calling
    0.07
    bands
    0.07
     Пас
    0.07
     действий
    0.07
    Act Density 0.001%

    No Known Activations