INDEX
Explanations
phrases related to race, specifically focusing on the term "white"
references to race and ethnic identities, particularly those related to white individuals and systemic issues
New Auto-Interp
Negative Logits
BIL
-0.79
bably
-0.78
utenberg
-0.72
ICLE
-0.69
pmwiki
-0.66
INAL
-0.66
incial
-0.65
inarily
-0.65
FORMATION
-0.64
ENTION
-0.64
POSITIVE LOGITS
oak
0.72
peria
0.69
igans
0.67
papers
0.67
stadt
0.66
sup
0.66
oxide
0.66
cloth
0.65
sand
0.63
ander
0.62
Activations Density 0.088%