INDEX
    Explanations

    phrases that express assumptions or hypotheses about a situation

    New Auto-Interp
    Negative Logits
    ohn
    -0.17
    /misc
    -0.15
    esel
    -0.15
    esson
    -0.14
    estro
    -0.14
    961
    -0.14
    еÑģÑĤ
    -0.13
    bes
    -0.13
    edu
    -0.13
     Fi
    -0.13
    POSITIVE LOGITS
     says
    0.19
     looks
    0.19
     LOOK
    0.17
     look
    0.17
    ckett
    0.16
    Look
    0.15
     failing
    0.15
    States
    0.15
    weet
    0.15
     Looks
    0.15
    Act Density 0.086%

    No Known Activations