INDEX
Explanations
references to racial or ethnic identities and their implications
New Auto-Interp
Negative Logits
Monfieur
-0.84
Shakspeare
-0.83
pleaſure
-0.81
itſelf
-0.79
perſ
-0.76
myſelf
-0.75
ſy
-0.75
Cæsar
-0.75
Majefty
-0.72
Theſe
-0.72
POSITIVE LOGITS
born
0.82
pinulongan
0.60
Born
0.59
rooted
0.57
BORN
0.54
heritage
0.52
gốc
0.51
geboren
0.51
heritage
0.51
szár
0.51
Activations Density 0.379%