INDEX
    Explanations

    mentions or descriptions of male individuals

    references to males in various contexts

    New Auto-Interp
    Negative Logits
    Tokens
    -0.81
    roll
    -0.81
    akings
    -0.77
    ateg
    -0.73
    planes
    -0.72
     Annotations
    -0.72
    Market
    -0.71
    eries
    -0.70
     Yards
    -0.69
    wal
    -0.68
    POSITIVE LOGITS
     male
    3.56
     female
    2.92
     males
    2.87
     Male
    2.75
    male
    2.65
    Male
    2.53
     Female
    2.47
    female
    2.44
     females
    2.30
    Female
    2.27
    Act Density 0.018%

    No Known Activations