INDEX
    Explanations

    mentions of specific individuals, particularly female figures

    New Auto-Interp
    Negative Logits
    iferation
    -0.80
    WC
    -0.72
    HCR
    -0.69
    UST
    -0.67
    udic
    -0.66
     Nanto
    -0.64
    VL
    -0.64
    HUD
    -0.63
    essee
    -0.63
    BD
    -0.63
    POSITIVE LOGITS
     Betty
    0.92
    yip
    0.88
    keye
    0.84
    rics
    0.82
    rand
    0.77
    hesda
    0.76
    oro
    0.75
    rants
    0.75
     Seym
    0.74
    plin
    0.72
    Act Density 0.013%

    No Known Activations