INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lovers
    -0.07
     Dalton
    -0.07
    _CENTER
    -0.07
     tongues
    -0.07
     GX
    -0.06
     Johannes
    -0.06
     degrade
    -0.06
     competitions
    -0.06
     Twilight
    -0.06
     Powder
    -0.06
    POSITIVE LOGITS
     app
    0.07
     точ
    0.07
    _shot
    0.06
    FullScreen
    0.06
     sch
    0.06
    (cond
    0.06
     přist
    0.06
    buah
    0.06
     olmayan
    0.06
    0.06
    Act Density 0.005%

    No Known Activations