INDEX
Explanations
expressions of confidence and self-assurance
New Auto-Interp
Negative Logits
Ber
-0.62
West
-0.61
Gow
-0.58
ar
-0.56
ber
-0.56
par
-0.55
ir
-0.54
sel
-0.54
Brum
-0.54
ber
-0.53
POSITIVE LOGITS
myſelf
1.44
itſelf
1.30
themſelves
1.25
confidence
1.25
confidence
1.25
Jefus
1.24
pleaſure
1.24
himſelf
1.21
Confidence
1.20
ſeveral
1.20
Activations Density 0.129%