INDEX
    Explanations

    instances of personal pronouns and expressions of uncertainty or lack of knowledge

    New Auto-Interp
    Negative Logits
    utdown
    -0.15
    _almost
    -0.15
    ura
    -0.14
    åŀ
    -0.14
    ugins
    -0.14
    ymm
    -0.14
    oubles
    -0.14
    resi
    -0.13
    =title
    -0.13
     Jag
    -0.13
    POSITIVE LOGITS
    ä¸įçŁ¥éģĵ
    0.36
     unknown
    0.35
     descon
    0.34
    unknown
    0.33
     don
    0.31
    Unknown
    0.31
     Unknown
    0.30
     UNKNOWN
    0.30
     unsure
    0.29
    _unknown
    0.28
    Act Density 0.231%

    No Known Activations