INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    不说
    -0.08
    _notification
    -0.08
    scratch
    -0.07
     breathtaking
    -0.07
     Fetish
    -0.07
     Perkins
    -0.06
     Newtown
    -0.06
     Cocktail
    -0.06
     Gothic
    -0.06
     Morton
    -0.06
    POSITIVE LOGITS
     radios
    0.08
    0.08
     Keep
    0.07
    пут
    0.07
    /GL
    0.07
     EX
    0.07
     Widgets
    0.07
    levation
    0.06
     UA
    0.06
     SPEC
    0.06
    Act Density 0.006%

    No Known Activations