INDEX
    Explanations

    words relating to people/biology and sex

    New Auto-Interp
    Negative Logits
     male
    -2.42
    Male
    -2.20
     Male
    -2.17
    male
    -2.11
     female
    -1.96
     MALE
    -1.91
    Female
    -1.86
     Female
    -1.82
    female
    -1.79
     männ
    -1.67
    POSITIVE LOGITS
     EconPapers
    0.75
    fromnode
    0.68
    tagHelperRunner
    0.66
    quias
    0.65
     GOG
    0.62
    makeText
    0.61
    ientos
    0.59
    rrggbb
    0.58
     Trag
    0.56
    umma
    0.56
    Act Density 9.934%

    No Known Activations