INDEX
Explanations
statements of opinion and expressions of personal beliefs
New Auto-Interp
Negative Logits
transición
-0.51
SequentialGroup
-0.46
financieras
-0.45
réfugiés
-0.44
DockStyle
-0.44
CallOverrides
-0.42
IsMutable
-0.42
éxitos
-0.42
高质量
-0.42
arquitetura
-0.41
POSITIVE LOGITS
moral
0.69
morally
0.69
justifiable
0.59
morals
0.57
moral
0.56
argument
0.56
Moral
0.56
justified
0.54
immoral
0.54
ethically
0.54
Activations Density 0.794%