INDEX
Explanations
phrases related to societal structures or moral dilemmas
New Auto-Interp
Negative Logits
Ô
-0.83
ãĥķãĤ©
-0.80
ãĤ¦ãĤ¹
-0.76
Scale
-0.74
Ext
-0.73
prints
-0.72
Ext
-0.70
Sac
-0.69
âĸĵ
-0.69
ovember
-0.69
POSITIVE LOGITS
person
1.23
girl
1.16
guy
1.15
woman
1.12
man
1.07
guy
0.99
persons
0.98
Person
0.97
lady
0.96
spouse
0.96
Activations Density 0.632%