INDEX
    Explanations

    references to specific individuals and their roles or actions

    New Auto-Interp
    Negative Logits
    وفاته
    -0.75
     męski
    -0.75
     himself
    -0.69
     męskie
    -0.65
     móg
    -0.65
    himself
    -0.64
     zijne
    -0.64
     sám
    -0.63
     boyhood
    -0.61
    ‍♂️
    -0.61
    POSITIVE LOGITS
     herself
    0.97
     businesswoman
    0.68
     lesbian
    0.65
     feminist
    0.63
    herself
    0.63
     woman
    0.61
     motherhood
    0.60
     womanhood
    0.58
     girl
    0.57
     goddess
    0.57
    Act Density 2.304%

    No Known Activations