INDEX
Explanations
phrases or sentences including both 'men' and 'women'
consistent references to groups, specifically emphasizing the presence of men and women together
New Auto-Interp
Negative Logits
Prol
-0.77
Contents
-0.75
actory
-0.72
:[
-0.71
ciation
-0.69
RP
-0.68
Collider
-0.67
GMT
-0.66
:(
-0.65
antage
-0.61
POSITIVE LOGITS
minorities
0.79
vice
0.76
toddlers
0.73
assorted
0.71
gans
0.71
infants
0.71
mothers
0.70
grandchildren
0.70
unborn
0.69
other
0.69
Activations Density 0.198%