INDEX
Explanations
mentions of values and principles
arguments related to morality and ethical considerations in societal frameworks
New Auto-Interp
Negative Logits
lbs
-0.72
heast
-0.65
Pwr
-0.65
NES
-0.63
UFC
-0.61
Sprint
-0.60
tips
-0.59
VIP
-0.59
laun
-0.59
Emergency
-0.59
POSITIVE LOGITS
epist
1.15
philosophers
0.97
presupp
0.96
insofar
0.95
empirical
0.92
normative
0.91
implicitly
0.89
subjective
0.86
intrinsically
0.86
empir
0.86
Activations Density 2.124%