INDEX
    Explanations

    personal opinions or beliefs expressed in texts

    expressions of personal knowledge or certainty

    New Auto-Interp
    Negative Logits
    luster
    -0.82
     moderation
    -0.69
    emale
    -0.68
     smashing
    -0.65
    flation
    -0.62
     Reneg
    -0.62
    gren
    -0.61
     festive
    -0.60
    interstitial
    -0.60
    Dro
    -0.59
    POSITIVE LOGITS
     know
    1.78
    know
    1.64
     KNOW
    1.64
     knew
    1.61
     Know
    1.60
     knows
    1.57
    Know
    1.55
     knowing
    1.42
    Knowing
    1.25
     knowledge
    1.24
    Act Density 0.285%

    No Known Activations