INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     femininos
    -0.90
     pitié
    -0.88
     célèbres
    -0.85
     feroit
    -0.85
     avoient
    -0.83
     étoient
    -0.82
     florales
    -0.82
     dianteiro
    -0.81
     digitais
    -0.81
     militaires
    -0.79
    POSITIVE LOGITS
     che
    0.59
     cer
    0.57
     membrane
    0.57
     ze
    0.55
     synthase
    0.54
     sp
    0.53
     tri
    0.53
     se
    0.52
     P
    0.51
     Se
    0.51
    Act Density 0.232%

    No Known Activations