INDEX
Explanations
mentions of fundamental beliefs or rules
references to established principles or guidelines
New Auto-Interp
Negative Logits
minster
-0.74
ilation
-0.70
sg
-0.69
leneck
-0.68
otte
-0.66
aughter
-0.65
olla
-0.65
eor
-0.64
reen
-0.64
ready
-0.64
POSITIVE LOGITS
principles
1.12
ciples
0.95
guiding
0.95
principals
0.91
Principles
0.91
underpin
0.88
principle
0.85
ophical
0.83
underlying
0.80
precept
0.77
Activations Density 0.018%