INDEX
Explanations
phrases related to ethical and moral considerations
references to ethical and moral issues
New Auto-Interp
Negative Logits
jong
-0.84
hof
-0.83
acular
-0.81
down
-0.79
gow
-0.75
peak
-0.75
xual
-0.74
upt
-0.74
ings
-0.71
aunts
-0.71
POSITIVE LOGITS
onomic
1.02
dile
0.99
ethical
0.98
ethical
0.87
conscience
0.79
hazard
0.79
ethics
0.78
qual
0.76
violations
0.76
disclosure
0.75
Activations Density 0.016%