INDEX
    Explanations

    references to female characters or titles related to women

    New Auto-Interp
    Negative Logits
    en
    -0.75
    al
    -0.73
    u
    -0.71
     Warszawie
    -0.66
     Wilmington
    -0.66
     Roswell
    -0.65
     prostitutes
    -0.63
     Exxon
    -0.62
     rito
    -0.62
     encephalitis
    -0.62
    POSITIVE LOGITS
     LADY
    1.28
     Lady
    1.23
    LADY
    1.12
    Lady
    1.09
     lady
    1.08
     Ladybug
    0.98
    lady
    0.96
     Ladies
    0.85
     ladybug
    0.85
     ladies
    0.83
    Act Density 0.006%

    No Known Activations