INDEX
    Explanations

    questions starting with "can" or its variations

    New Auto-Interp
    Negative Logits
    ather
    -0.18
    gel
    -0.16
    anke
    -0.16
    ATHER
    -0.16
    mits
    -0.15
    athers
    -0.15
    lify
    -0.15
    _SECURE
    -0.15
    chen
    -0.14
    ois
    -0.14
    POSITIVE LOGITS
     you
    0.24
    't
    0.24
    ’t
    0.24
     we
    0.23
     someone
    0.21
    uto
    0.18
    va
    0.18
    onic
    0.17
    apes
    0.17
    opy
    0.17
    Act Density 0.028%

    No Known Activations