INDEX
    Explanations

    references to female protagonists or figures of significance

    New Auto-Interp
    Negative Logits
    ners
    -0.18
    pun
    -0.16
    igkeit
    -0.16
    away
    -0.16
    olas
    -0.15
    ty
    -0.15
    mate
    -0.15
     रहन
    -0.15
    ster
    -0.15
    tures
    -0.15
    POSITIVE LOGITS
    ines
    0.38
    ics
    0.30
    ine
    0.27
    ically
    0.26
    INES
    0.25
    ism
    0.23
    ÃŃna
    0.23
    INE
    0.22
     Worship
    0.21
    icism
    0.21
    Act Density 0.022%

    No Known Activations