INDEX
Explanations
words related to moral or ethical duty
mention of the concept of responsibility
New Auto-Interp
Negative Logits
arthed
-0.72
sell
-0.70
vae
-0.69
Fort
-0.66
Hig
-0.65
Ellison
-0.65
Stall
-0.64
Maver
-0.64
Ashton
-0.64
Tigers
-0.64
POSITIVE LOGITS
responsibility
1.34
responsibilities
1.11
Responsibility
1.05
respons
0.97
ignty
0.90
respons
0.89
responsible
0.86
culp
0.85
lessly
0.81
obligation
0.81
Activations Density 0.012%