INDEX
Explanations
statements related to proving oneself or demonstrating something to others
expressions of proving oneself or demonstrating capabilities
New Auto-Interp
Negative Logits
captcha
-0.78
Variant
-0.71
pse
-0.70
etting
-0.70
revolving
-0.66
ascript
-0.65
lihood
-0.65
ieri
-0.64
sugg
-0.64
acci
-0.62
POSITIVE LOGITS
appreciation
1.03
superiority
0.92
dominance
0.88
solidarity
0.85
displeasure
0.85
gratitude
0.83
kindness
0.80
individuality
0.76
biz
0.75
defiance
0.74
Activations Density 0.186%