INDEX
Explanations
words related to morality and ethical considerations
New Auto-Interp
Negative Logits
IsInitialized
-0.50
können
-0.42
setCancelable
-0.40
GTCX
-0.40
reactstrap
-0.39
égard
-0.38
StoryboardSegue
-0.38
Neglect
-0.38
mxArray
-0.38
useRouter
-0.38
POSITIVE LOGITS
mor
0.84
moral
0.84
mor
0.83
MOR
0.80
moral
0.77
Morrison
0.77
morality
0.77
Morrison
0.76
Moral
0.75
MOR
0.74
Activations Density 1.543%