INDEX
    Explanations

    statements related to belief or trust

    phrases that express belief or trust

    New Auto-Interp
    Negative Logits
    imilar
    -0.77
    aida
    -0.71
    idian
    -0.70
     sidel
    -0.66
    entin
    -0.66
    undy
    -0.66
    inav
    -0.65
    imer
    -0.65
    ertation
    -0.65
    ija
    -0.63
    POSITIVE LOGITS
     when
    0.72
     WHEN
    0.65
     Yourself
    0.61
     hype
    0.60
     yourselves
    0.57
     unless
    0.57
     expr
    0.56
     whenever
    0.56
     eminent
    0.56
     }}
    0.55
    Act Density 0.069%

    No Known Activations