INDEX
Explanations
phrases mentioning both men and women
references to gender, particularly focusing on the roles and presence of women in various contexts
word2 in "word1 and word2", especially when the words are conceptually related
Explanation Uploaded by User
New Auto-Interp
Negative Logits
Joy
-0.71
Thunder
-0.68
MN
-0.68
Dur
-0.68
Kevin
-0.67
Tok
-0.67
afety
-0.66
San
-0.66
GREEN
-0.65
Inv
-0.65
POSITIVE LOGITS
alike
1.57
striped
1.00
respectively
0.99
combatants
0.85
halves
0.69
faiths
0.67
sexes
0.66
separated
0.66
equally
0.66
separately
0.65
Activations Density 0.131%