INDEX
    Explanations

    instances of historical firsts or significant achievements by women and underrepresented groups

    New Auto-Interp
    Negative Logits
     æĢ
    -0.15
    Ãłng
    -0.15
    º«
    -0.14
    pill
    -0.14
    оÑĢи
    -0.14
    avid
    -0.13
     Sheldon
    -0.13
     upper
    -0.13
    Proxy
    -0.13
    anton
    -0.13
    POSITIVE LOGITS
     becoming
    0.33
     become
    0.32
     becomes
    0.32
    bec
    0.32
     Become
    0.28
     became
    0.28
     Bec
    0.27
    Become
    0.24
    æĪIJ为
    0.24
     Became
    0.23
    Act Density 0.169%

    No Known Activations