INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Gender
    -0.08
    ते
    -0.08
    ède
    -0.08
     Tän
    -0.08
     Médio
    -0.08
    _uniform
    -0.07
     Jewelry
    -0.07
    .Float
    -0.07
    flowers
    -0.07
    (TAG
    -0.07
    POSITIVE LOGITS
     nerd
    0.10
     Debian
    0.09
     parody
    0.09
     aho
    0.09
     bière
    0.09
     knowledgeable
    0.09
     munk
    0.08
     Nerd
    0.08
     বান
    0.08
     rpm
    0.08
    Act Density 0.019%

    No Known Activations