INDEX
    Explanations

    expressions emphasizing certainty or assurance

    New Auto-Interp
    Negative Logits
    845
    -0.17
    strar
    -0.17
     Homo
    -0.15
     Trial
    -0.15
    rial
    -0.15
    isz
    -0.15
    546
    -0.15
    ầm
    -0.14
    967
    -0.14
    rint
    -0.14
    POSITIVE LOGITS
     fair
    0.25
     honest
    0.22
     frank
    0.20
     candid
    0.20
    fair
    0.19
     ped
    0.19
     Fair
    0.18
     precise
    0.18
     perfectly
    0.17
     blunt
    0.17
    Act Density 0.020%

    No Known Activations