INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     khu
    -0.45
     ubica
    -0.43
     jaren
    -0.43
     Freeman
    -0.43
     właśnie
    -0.42
    addAttribute
    -0.42
    backends
    -0.42
     tijden
    -0.41
     Harris
    -0.41
     there
    -0.40
    POSITIVE LOGITS
     Soap
    1.34
    Soap
    1.30
    soap
    1.28
     soap
    1.28
    SOAP
    1.05
     SOAP
    1.03
     soaps
    0.92
     jabón
    0.91
     soapy
    0.85
     sabun
    0.81
    Act Density 0.001%

    No Known Activations