INDEX
    Explanations

    references to societal roles and identities

    New Auto-Interp
    Negative Logits
    atism
    -0.15
    AML
    -0.14
    ë²Į
    -0.14
     bait
    -0.14
    olik
    -0.14
     Fraser
    -0.13
    ùa
    -0.13
    usk
    -0.13
    defs
    -0.13
    xac
    -0.13
    POSITIVE LOGITS
    ynos
    0.19
    lint
    0.15
    dae
    0.15
    untu
    0.15
    ttp
    0.14
    .AF
    0.14
    itra
    0.14
    isini
    0.14
    azen
    0.14
     pou
    0.13
    Act Density 0.154%

    No Known Activations