INDEX
Explanations
phrases indicating representation or action taken on someone's behalf
phrases indicating representation or advocacy for others
New Auto-Interp
Negative Logits
ined
-0.60
itals
-0.58
Sapp
-0.58
crotch
-0.58
diapers
-0.58
bern
-0.57
Conway
-0.57
plum
-0.57
stir
-0.56
Marathon
-0.56
POSITIVE LOGITS
selves
0.81
behalf
0.76
avement
0.75
anding
0.72
giving
0.72
acion
0.69
ours
0.68
oux
0.68
uary
0.65
edIn
0.65
Activations Density 0.023%