INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hay
    -0.07
     Studi
    -0.07
    se
    -0.06
    Want
    -0.06
     telesc
    -0.06
    нение
    -0.06
     linha
    -0.06
     Studio
    -0.06
     место
    -0.06
    ados
    -0.06
    POSITIVE LOGITS
     moral
    0.08
     morally
    0.07
    0.07
     shake
    0.07
    xoops
    0.07
     Moral
    0.07
     piston
    0.07
    514
    0.06
     misd
    0.06
     minimizing
    0.06
    Act Density 0.008%

    No Known Activations