INDEX
Explanations
attributes related to power dynamics and authority figures
New Auto-Interp
Negative Logits
œurs
-0.42
bibinfo
-0.41
DetailActivity
-0.40
絞
-0.38
excru
-0.38
Easter
-0.36
出版年
-0.35
jangkau
-0.35
MessageTagHelper
-0.35
])):
-0.35
POSITIVE LOGITS
power
0.68
arrogance
0.67
arrogant
0.66
swagger
0.65
proud
0.63
pompous
0.63
strut
0.61
prestige
0.60
confidently
0.60
pride
0.58
Activations Density 0.344%