INDEX
Explanations
phrases expressing pride and support towards different aspects or groups
expressions of pride
New Auto-Interp
Negative Logits
block
-0.69
ences
-0.67
enz
-0.65
LW
-0.64
Ess
-0.62
alternatives
-0.62
Option
-0.61
erg
-0.60
specified
-0.60
situations
-0.60
POSITIVE LOGITS
proud
3.76
ashamed
1.89
Proud
1.87
proudly
1.86
pride
1.69
pleased
1.62
thankful
1.52
roud
1.50
grateful
1.50
jealous
1.45
Activations Density 0.017%