INDEX
    Explanations

    words related to trustworthiness and credibility

    terms related to worthiness or merit

    New Auto-Interp
    Negative Logits
    ACA
    -0.71
    oan
    -0.66
    WAR
    -0.66
    eq
    -0.65
    udeb
    -0.64
     Wah
    -0.63
    ATHER
    -0.61
    hran
    -0.60
    ERA
    -0.60
     Shank
    -0.60
    POSITIVE LOGITS
    worthy
    1.13
    nesses
    1.04
    ness
    0.89
    lihood
    0.87
    orthy
    0.74
    worthiness
    0.74
    iaries
    0.72
     worthy
    0.70
    otine
    0.70
    icles
    0.69
    Act Density 0.012%

    No Known Activations