INDEX
Explanations
principles and values within text, particularly those related to principles of governance, moral principles, and core values
references to foundational principles and values
New Auto-Interp
Negative Logits
glitches
-0.70
tg
-0.67
guesses
-0.67
speculate
-0.66
rumors
-0.66
ammers
-0.66
queues
-0.65
ctic
-0.63
sites
-0.63
events
-0.63
POSITIVE LOGITS
enshr
1.49
principles
1.24
embodied
1.22
underpin
1.16
guiding
1.13
Principles
1.13
articulated
1.08
principle
1.07
upheld
1.06
precept
1.05
Activations Density 0.233%