INDEX
    Explanations

    references to or descriptions of women

    mentions of women in various contexts

    New Auto-Interp
    Negative Logits
    ypes
    -0.92
     Flavoring
    -0.89
    agascar
    -0.79
    ython
    -0.79
    raltar
    -0.78
    UFF
    -0.76
    vernment
    -0.76
    inctions
    -0.73
    rador
    -0.72
    ernels
    -0.72
    POSITIVE LOGITS
    izer
    1.09
    hood
    1.05
    folk
    0.95
    pher
    0.94
    cule
    0.87
     Louise
    0.82
     woman
    0.81
    izers
    0.80
     who
    0.80
     herself
    0.79
    Act Density 0.048%

    No Known Activations