INDEX
    Explanations

    mentions of specific female names, particularly in relation to discussions of their status or actions

    New Auto-Interp
    Negative Logits
    lee
    -0.17
    achel
    -0.15
     bir
    -0.15
    apses
    -0.14
    zer
    -0.14
     Marketable
    -0.14
    _fmt
    -0.14
    uchs
    -0.14
    aler
    -0.14
    ACHE
    -0.14
    POSITIVE LOGITS
    agara
    0.29
    elsen
    0.22
    itos
    0.17
    elson
    0.16
     Ni
    0.16
    olson
    0.16
    hoff
    0.16
    бÑĥдÑĮ
    0.16
    itsu
    0.15
    itty
    0.15
    Act Density 0.018%

    No Known Activations