INDEX
Explanations
phrases emphasizing honesty, fairness, and self-reflection
Follows "to be" and relates to honesty/fairness
to be honest
New Auto-Interp
Negative Logits
ⓧ
-0.63
SharedDtor
-0.54
Спољашње
-0.49
이션
-0.47
/******/
-0.46
ymce
-0.45
pageContext
-0.45
verna
-0.43
่านั้น
-0.43
Pautan
-0.42
POSITIVE LOGITS
truth
1.84
honestly
1.67
frankly
1.58
Truth
1.55
truth
1.52
honest
1.48
Truth
1.45
honestly
1.43
Honestly
1.43
truthfully
1.39
Activations Density 0.124%