INDEX
Explanations
mentions of values, especially in the context of principles, ethics, and belief systems
references to ethical or moral values
New Auto-Interp
Negative Logits
女
-0.79
geon
-0.73
igans
-0.71
waves
-0.69
sie
-0.68
bare
-0.68
rontal
-0.68
jen
-0.67
wa
-0.64
Sus
-0.63
POSITIVE LOGITS
ideals
0.77
values
0.77
tolerance
0.75
iblings
0.74
principles
0.71
proposition
0.69
guiding
0.68
beliefs
0.68
sensibilities
0.68
propositions
0.66
Activations Density 0.013%