INDEX
Explanations
references to racial and ethnic identity
New Auto-Interp
Negative Logits
.cf
-0.14
askan
-0.14
ocale
-0.14
ilon
-0.14
Leer
-0.14
иÑģÑĤÑĢа
-0.13
è¹
-0.13
thá»ķ
-0.13
ickle
-0.13
Ø®ÙĪ
-0.12
POSITIVE LOGITS
look
1.16
looks
1.12
look
1.00
Look
0.98
looked
0.97
LOOK
0.97
looks
0.95
Looks
0.93
Look
0.91
_look
0.88
Activations Density 0.649%