INDEX
Explanations
phrases related to personal attributes or characteristics, such as gender, nationality, or physical appearance
terms related to gender identity and societal perceptions of gender
New Auto-Interp
Negative Logits
trickle
-0.80
urgency
-0.78
patience
-0.78
distraction
-0.76
rush
-0.76
impat
-0.72
overload
-0.72
hurry
-0.72
optimization
-0.72
payoff
-0.71
POSITIVE LOGITS
Therefore
0.75
Therefore
0.75
ancestry
0.74
nationality
0.73
Origin
0.72
properties
0.71
onyms
0.70
marrying
0.70
chromosomes
0.69
pronouns
0.69
Activations Density 0.922%