INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ena
    -0.08
     probable
    -0.08
     regulator
    -0.08
     Flair
    -0.07
     Ballet
    -0.07
     дополн
    -0.07
     Baking
    -0.07
    058
    -0.07
     nâng
    -0.07
     diesen
    -0.07
    POSITIVE LOGITS
    hund
    0.08
     polling
    0.08
     Ish
    0.08
     militar
    0.07
     sheer
    0.07
     dono
    0.07
    Mom
    0.07
     Husband
    0.07
     workers
    0.07
    0.07
    Act Density 0.002%

    No Known Activations