INDEX
Explanations
phrases related to race or skin color
references to people of color
New Auto-Interp
Negative Logits
preparations
-0.60
hallucinations
-0.58
Reloaded
-0.58
Features
-0.57
symptoms
-0.57
rounds
-0.57
veins
-0.56
seams
-0.54
needles
-0.54
thumbnails
-0.54
POSITIVE LOGITS
ortunately
0.96
course
0.85
icial
0.80
whom
0.75
pires
0.74
course
0.70
iciency
0.67
idth
0.66
sted
0.66
ramer
0.65
Activations Density 0.091%