INDEX
    Explanations

    words related to philosophical concepts and qualities, as well as characteristics associated with gender stereotypes

    New Auto-Interp
    Negative Logits
     Canaver
    -0.76
    etsy
    -0.71
    QUIRE
    -0.71
    ãģ®éŃĶ
    -0.68
    ISTORY
    -0.65
    CLAIM
    -0.64
    RIPT
    -0.64
    acas
    -0.64
     Sparks
    -0.62
    ä½ľ
    -0.60
    POSITIVE LOGITS
     ones
    0.89
     versa
    0.86
     etc
    0.80
    -)
    0.79
    +.
    0.76
    -.
    0.74
    *.
    0.73
     respectively
    0.72
    entric
    0.71
    ecided
    0.70
    Act Density 0.406%

    No Known Activations