INDEX
Explanations
phrases related to honesty and ethical behavior
mentions of integrity in various contexts
New Auto-Interp
Negative Logits
opa
-0.70
sg
-0.68
enegger
-0.68
Advertisements
-0.68
Stock
-0.68
Adapter
-0.66
Jet
-0.66
Pain
-0.65
hop
-0.65
Ingredients
-0.65
POSITIVE LOGITS
rity
1.23
integrity
1.19
Integrity
1.00
acies
0.95
orously
0.87
ulence
0.84
safeguards
0.77
otiation
0.77
amental
0.77
preservation
0.76
Activations Density 0.009%