INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     refriger
    -0.08
    ebilir
    -0.08
    에서는
    -0.07
     negligence
    -0.07
     pig
    -0.07
    쳤다
    -0.07
     internally
    -0.06
     sayesinde
    -0.06
     характ
    -0.06
     страш
    -0.06
    POSITIVE LOGITS
     tutto
    0.06
     tells
    0.05
    salt
    0.05
    Reset
    0.05
     suited
    0.05
    Sad
    0.05
     التعليم
    0.05
     Json
    0.05
    _ZONE
    0.05
     amplify
    0.05
    Act Density 0.045%

    No Known Activations