INDEX
Explanations
instances of persuasive language and power dynamics in interpersonal or political contexts
New Auto-Interp
Negative Logits
î¡
-0.17
unami
-0.15
evi
-0.15
odesk
-0.15
_,,
-0.15
ropp
-0.15
niž
-0.15
seins
-0.15
ÂłÙħ
-0.14
iginal
-0.14
POSITIVE LOGITS
fashioned
0.19
molded
0.18
crafted
0.17
stressed
0.17
buried
0.17
shattered
0.16
slashed
0.16
stretched
0.16
united
0.16
settled
0.16
Activations Density 1.011%