INDEX
Explanations
phrases related to personal autonomy and decision-making
references to the concept of self-determination or autonomy
New Auto-Interp
Negative Logits
Mub
-0.72
onite
-0.71
ammy
-0.70
ayne
-0.70
Derby
-0.70
rise
-0.69
etta
-0.69
vals
-0.69
grade
-0.67
iard
-0.67
POSITIVE LOGITS
selves
0.84
underwater
0.82
selves
0.77
explan
0.70
creatively
0.70
fict
0.70
altru
0.68
conduc
0.68
destruct
0.66
ashamed
0.65
Activations Density 0.045%