INDEX
    Explanations

    mentions of specific people

    the presence of specific pronouns and references to individuals, particularly focusing on female pronouns

    New Auto-Interp
    Negative Logits
    igham
    -0.67
    reach
    -0.65
    ¿½
    -0.65
    iHUD
    -0.60
    izo
    -0.58
    ender
    -0.58
     Allied
    -0.58
    isky
    -0.55
    hod
    -0.55
     reperc
    -0.55
    POSITIVE LOGITS
    ï¸ı
    0.80
    xual
    0.71
    Ö¼
    0.69
     respectively
    0.68
    в
    0.66
    anwhile
    0.65
    к
    0.63
    sic
    0.62
    arine
    0.62
    erent
    0.61
    Act Density 0.623%

    No Known Activations