INDEX
Explanations
expressions of gratitude and kindness
expressions of gratitude and kindness
New Auto-Interp
Negative Logits
contested
-0.72
scene
-0.69
competing
-0.68
SUP
-0.68
charged
-0.67
NC
-0.65
wang
-0.64
viks
-0.64
merce
-0.64
uter
-0.64
POSITIVE LOGITS
gratitude
2.32
kindness
2.28
generosity
2.20
curiosity
2.17
humility
2.11
empathy
1.95
cynicism
1.91
optimism
1.88
honesty
1.85
arrogance
1.84
Activations Density 0.051%