INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    owl
    -0.07
     carg
    -0.06
     державної
    -0.06
    yses
    -0.06
     Федераль
    -0.06
     Manafort
    -0.06
    slick
    -0.06
     sức
    -0.06
     FileWriter
    -0.06
    162
    -0.06
    POSITIVE LOGITS
     beginners
    0.14
     beginner
    0.14
     Beginner
    0.11
     Beginners
    0.08
    ,没有
    0.07
    '),↵↵
    0.07
     freshman
    0.07
     experienced
    0.07
     accepting
    0.07
    ‬↵
    0.07
    Act Density 0.004%

    No Known Activations