INDEX
    Explanations

    references to racial or ethnic identities and their implications

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.84
     Shakspeare
    -0.83
     pleaſure
    -0.81
     itſelf
    -0.79
     perſ
    -0.76
     myſelf
    -0.75
     ſy
    -0.75
     Cæsar
    -0.75
     Majefty
    -0.72
     Theſe
    -0.72
    POSITIVE LOGITS
     born
    0.82
     pinulongan
    0.60
     Born
    0.59
    rooted
    0.57
     BORN
    0.54
    heritage
    0.52
     gốc
    0.51
     geboren
    0.51
     heritage
    0.51
     szár
    0.51
    Act Density 0.379%

    No Known Activations