INDEX
    Explanations

    explanations or reasoning in a text

    New Auto-Interp
    Negative Logits
    icipated
    -0.66
    apers
    -0.66
    rift
    -0.64
    iership
    -0.63
    RAW
    -0.63
    actionDate
    -0.62
    shaw
    -0.61
    vez
    -0.60
    display
    -0.60
    ourse
    -0.58
    POSITIVE LOGITS
     yeah
    1.04
    yeah
    1.04
    hhh
    0.99
    hhhh
    0.95
     prest
    0.94
     Yeah
    0.93
     kidding
    0.92
     pardon
    0.89
    mmm
    0.88
     yea
    0.87
    Act Density 0.648%

    No Known Activations