INDEX
Explanations
expressions of choice and individual discretion
New Auto-Interp
Negative Logits
uzzi
-0.18
izzy
-0.14
itious
-0.14
shed
-0.14
egral
-0.14
ıcı
-0.13
iphy
-0.13
datal
-0.13
EEK
-0.13
bourg
-0.13
POSITIVE LOGITS
discretion
0.39
subjective
0.35
personal
0.33
choice
0.30
decision
0.28
judgment
0.28
preference
0.28
opinion
0.27
individual
0.26
personal
0.26
Activations Density 0.193%