INDEX
    Explanations

    references to feminism and feminist movements

    New Auto-Interp
    Negative Logits
    scar
    -0.17
     hal
    -0.16
    Hal
    -0.15
    s
    -0.15
     Hal
    -0.15
    hal
    -0.15
    itud
    -0.15
    ed
    -0.15
    _FMT
    -0.15
    Ñģк
    -0.15
    POSITIVE LOGITS
    ationToken
    0.16
    assi
    0.16
    EDIA
    0.15
     objectType
    0.15
    kowski
    0.14
    -Christian
    0.14
    缤
    0.14
    ué
    0.14
    cdecl
    0.14
    otel
    0.14
    Act Density 0.011%

    No Known Activations