INDEX
Explanations
concepts related to identity and autonomy within domestic and social contexts
New Auto-Interp
Negative Logits
adel
-0.15
cÃŃ
-0.14
wap
-0.14
θή
-0.14
جا
-0.14
عبر
-0.13
Ư
-0.13
Favor
-0.12
Medal
-0.12
ugal
-0.12
POSITIVE LOGITS
autonomy
0.49
freedom
0.49
independence
0.44
autonomous
0.42
autonom
0.41
liberty
0.40
independent
0.39
freedoms
0.39
independ
0.38
Freedom
0.38
Activations Density 0.493%