INDEX
Explanations
mentions of fundamental concepts or beliefs
references to principles, particularly in legal and ethical contexts
New Auto-Interp
Negative Logits
ammers
-0.71
minster
-0.70
leneck
-0.68
ilation
-0.67
NetMessage
-0.67
quer
-0.67
eor
-0.66
essions
-0.65
dos
-0.64
hiba
-0.64
POSITIVE LOGITS
principles
1.01
principle
0.91
guiding
0.90
ciples
0.88
underlying
0.83
ually
0.82
underpin
0.81
Principles
0.80
precept
0.73
cipled
0.72
Activations Density 0.023%