INDEX
    Explanations

    mentions of women and references to their roles or relationships in various contexts

    New Auto-Interp
    Negative Logits
    ikip
    -0.15
     Royale
    -0.15
    itet
    -0.15
    elier
    -0.15
    amura
    -0.15
    ÎŃ
    -0.14
    thon
    -0.14
    annon
    -0.13
    118
    -0.13
    ntag
    -0.13
    POSITIVE LOGITS
    iral
    0.19
     alike
    0.17
    ç̬
    0.14
    CKET
    0.13
    chron
    0.13
    лиÑĪ
    0.13
    ãĤ¹ãĤ¿ãĥ¼
    0.13
    íķ
    0.13
    822
    0.13
    Desktop
    0.13
    Act Density 0.013%

    No Known Activations