INDEX
Explanations
phrases related to moral values and principles
references to core values and principles
New Auto-Interp
Negative Logits
geon
-0.76
女
-0.73
--------------------------------------------------------
-0.72
waves
-0.69
fac
-0.68
sie
-0.67
\/\/
-0.64
OVA
-0.63
Sus
-0.63
brain
-0.62
POSITIVE LOGITS
values
0.82
ideals
0.79
Values
0.76
beliefs
0.75
values
0.75
propositions
0.74
proposition
0.74
principles
0.73
embodied
0.72
sensibilities
0.71
Activations Density 0.016%