INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Female
    -0.09
     всему
    -0.08
     Female
    -0.08
     females
    -0.08
    -performing
    -0.08
    ох
    -0.08
     somos
    -0.07
     female
    -0.07
    -purple
    -0.07
     THC
    -0.07
    POSITIVE LOGITS
     manus
    0.08
    fusion
    0.08
     jakt
    0.07
    علام
    0.07
     refug
    0.07
     secretion
    0.07
     prema
    0.07
     fusion
    0.07
     injunction
    0.07
    arman
    0.07
    Act Density 0.001%

    No Known Activations