INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sắc
    -0.08
     Horde
    -0.08
     तकनी
    -0.08
     Chow
    -0.08
    یز
    -0.08
    یزی
    -0.07
     soot
    -0.07
     Socks
    -0.07
    نظر
    -0.07
     Clears
    -0.07
    POSITIVE LOGITS
    won
    0.09
    abei
    0.08
     Gi
    0.08
     yielded
    0.08
     yana
    0.08
     willingly
    0.08
    ayin
    0.07
    pons
    0.07
    0.07
    ange
    0.07
    Act Density 0.006%

    No Known Activations