INDEX
    Explanations

    references to people's names

    New Auto-Interp
    Negative Logits
    awar
    -0.74
    */(
    -0.74
    unin
    -0.74
    hire
    -0.72
    heimer
    -0.72
    umbn
    -0.71
    olesc
    -0.70
    shit
    -0.69
    abol
    -0.69
    fare
    -0.68
    POSITIVE LOGITS
     Lynn
    1.04
     Louise
    1.01
     Patricia
    1.00
     Marie
    0.98
     Nicole
    0.95
     Jane
    0.92
     Garcia
    0.92
     Gloria
    0.91
     Lopez
    0.91
     Sue
    0.91
    Act Density 0.065%

    No Known Activations