INDEX
Explanations
themes related to control, conformity, and individuality in societal contexts
New Auto-Interp
Negative Logits
ureka
-0.14
ascus
-0.14
ê
-0.14
Ãłi
-0.13
695
-0.13
ayır
-0.13
utex
-0.12
atsby
-0.12
tiế
-0.12
ibold
-0.12
POSITIVE LOGITS
obedience
0.50
obedient
0.49
obed
0.48
submission
0.47
submissive
0.44
obey
0.44
obe
0.40
compliance
0.40
Submission
0.39
serv
0.38
Activations Density 0.501%