INDEX
    Explanations

    references to female characters or entities in a narrative context

    New Auto-Interp
    Negative Logits
    -
    -0.18
    thon
    -0.16
     Tage
    -0.16
    /
    -0.14
     cult
    -0.14
     la
    -0.14
     giorno
    -0.14
    b
    -0.13
     coefficient
    -0.13
     dolore
    -0.13
    POSITIVE LOGITS
    zos
    0.18
     misma
    0.17
     academia
    0.16
    mgr
    0.15
     Glover
    0.15
    λεκ
    0.15
    quelle
    0.15
    shal
    0.15
    enos
    0.15
    undry
    0.15
    Act Density 0.054%

    No Known Activations