INDEX
    Explanations

    expressions of certainty and understanding in discussions

    New Auto-Interp
    Negative Logits
    uraa
    -0.18
    ENAME
    -0.17
    lero
    -0.16
     Bec
    -0.15
    orage
    -0.15
    ertext
    -0.14
    bearer
    -0.14
    _bridge
    -0.14
    ÃĸL
    -0.14
     vere
    -0.14
    POSITIVE LOGITS
     knows
    0.17
    .experimental
    0.17
    urgeon
    0.17
     Permanent
    0.16
    çŁ¥éģĵ
    0.15
    mpr
    0.15
    ัà¸į
    0.15
    ahn
    0.15
     know
    0.15
     permanent
    0.14
    Act Density 0.176%

    No Known Activations