INDEX
Explanations
phrases indicating alignment or agreement with certain principles, values, or norms
phrases that indicate conformity or alignment with policies, values, or standards
New Auto-Interp
Negative Logits
gins
-0.73
enaries
-0.68
roma
-0.67
zon
-0.66
ikes
-0.63
bodied
-0.63
reau
-0.63
eware
-0.62
erers
-0.61
verage
-0.59
POSITIVE LOGITS
tradition
1.05
expectations
0.93
prevailing
0.92
precedent
0.92
norms
0.90
tenets
0.87
principles
0.87
recommendations
0.86
orthodoxy
0.83
reality
0.83
Activations Density 0.144%