INDEX
Explanations
phrases related to actions done on someone's behalf or for their benefit
terms related to power dynamics and authority in relationships
New Auto-Interp
Negative Logits
uesday
-0.78
uve
-0.70
adena
-0.69
ammy
-0.69
tein
-0.66
Topic
-0.66
binary
-0.65
ãĤ£
-0.63
Bul
-0.63
usted
-0.63
POSITIVE LOGITS
steps
0.98
stretched
0.70
.
0.68
liest
0.68
books
0.67
selves
0.64
subordinates
0.61
Majesty
0.60
iest
0.60
lessness
0.60
Activations Density 0.262%