INDEX
    Explanations

    personal pronouns and gendered nouns

    references to male and female characters

    New Auto-Interp
    Negative Logits
    grave
    -0.76
    assing
    -0.75
    stellar
    -0.66
    ylon
    -0.65
    igmatic
    -0.64
    kefeller
    -0.63
     Observatory
    -0.62
    Ĥª
    -0.61
    irm
    -0.61
    cgi
    -0.60
    POSITIVE LOGITS
    mos
    0.95
     Majesty
    0.94
    'll
    0.93
     didn
    0.86
    'd
    0.85
     knew
    0.84
     wanted
    0.82
     knows
    0.81
     didnt
    0.80
     hates
    0.80
    Act Density 0.237%

    No Known Activations