INDEX
    Explanations

    questions or expressions of uncertainty

    phrases that pose questions about understanding or explanations

    New Auto-Interp
    Negative Logits
    Laughs
    -0.72
    otti
    -0.68
    OGR
    -0.60
    rive
    -0.60
    enza
    -0.59
    iva
    -0.59
    ©¶æ¥µ
    -0.58
    ONG
    -0.58
    ey
    -0.57
    horn
    -0.57
    POSITIVE LOGITS
     why
    1.81
     whether
    1.61
     WHY
    1.59
     how
    1.49
    why
    1.42
     whence
    1.28
     what
    1.27
    whether
    1.21
     HOW
    1.14
     whereabouts
    1.12
    Act Density 0.174%

    No Known Activations