INDEX
Explanations
identifiers related to personal characteristics or identities like race, religion, and sexual orientation
terms related to sexual orientation, race, and religion
New Auto-Interp
Negative Logits
headline
-0.67
akings
-0.67
bulletin
-0.64
suites
-0.62
subsequ
-0.61
programme
-0.61
Programme
-0.61
Regulation
-0.61
prise
-0.60
execute
-0.59
POSITIVE LOGITS
enough
1.10
enough
1.09
skinned
0.92
Enough
0.84
ophobic
0.82
herself
0.82
oneself
0.80
imately
0.79
myself
0.78
anymore
0.77
Activations Density 0.223%