INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rachtet
    -0.57
     Pact
    -0.53
    KommentareTeilen
    -0.53
    AnimationsModule
    -0.52
     dictation
    -0.50
    بوابة
    -0.49
     cozin
    -0.49
     dianteiro
    -0.48
     Infórmanos
    -0.48
     sneezing
    -0.47
    POSITIVE LOGITS
    himself
    1.16
     himself
    1.11
    itself
    1.10
     Yourself
    1.09
     Himself
    1.09
    herself
    1.07
     yourself
    1.05
     herself
    1.04
     itself
    1.04
    Yourself
    1.02
    Act Density 0.137%

    No Known Activations