INDEX
    Explanations

    phrases emphasizing honesty, fairness, and self-reflection

    Follows "to be" and relates to honesty/fairness

    New Auto-Interp
    Negative Logits
    -0.63
    SharedDtor
    -0.54
    Спољашње
    -0.49
    이션
    -0.47
    /******/
    -0.46
    ymce
    -0.45
     pageContext
    -0.45
    verna
    -0.43
    ่านั้น
    -0.43
    Pautan
    -0.42
    POSITIVE LOGITS
     truth
    1.84
     honestly
    1.67
     frankly
    1.58
    Truth
    1.55
    truth
    1.52
     honest
    1.48
     Truth
    1.45
    honestly
    1.43
     Honestly
    1.43
     truthfully
    1.39
    Act Density 0.124%

    No Known Activations