INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     useClass
    -0.06
    рест
    -0.06
    -0.06
     Fiber
    -0.06
    lara
    -0.06
     miserable
    -0.05
     thriller
    -0.05
    itta
    -0.05
    -public
    -0.05
     pedestal
    -0.05
    POSITIVE LOGITS
    _tA
    0.08
     Matth
    0.07
     Trading
    0.07
     chickens
    0.07
    _FREQ
    0.07
    latent
    0.07
     вариант
    0.07
     loung
    0.06
    อเมร
    0.06
     Final
    0.06
    Act Density 0.014%

    No Known Activations