INDEX
Explanations
concepts related to moral values and personal development
New Auto-Interp
Negative Logits
ond
-0.16
preg
-0.15
alez
-0.15
newcomer
-0.14
Chain
-0.14
gangs
-0.14
nga
-0.14
riz
-0.13
knack
-0.13
bies
-0.13
POSITIVE LOGITS
human
0.32
citizen
0.31
citizens
0.31
cit
0.29
Cit
0.28
Citizens
0.27
Citizen
0.27
humans
0.27
citiz
0.26
human
0.26
Activations Density 0.159%