INDEX
    Explanations

    phrases that highlight significant initial actions or observations

    New Auto-Interp
    Negative Logits
    ancies
    -0.91
    doms
    -0.79
    sports
    -0.76
    sung
    -0.74
    contin
    -0.73
    raph
    -0.73
    etheless
    -0.72
     Journals
    -0.71
    rw
    -0.68
    TPP
    -0.67
    POSITIVE LOGITS
     reaction
    0.84
     responders
    0.81
     introdu
    0.80
     foremost
    0.77
     sentence
    0.71
     blush
    0.70
     temptation
    0.70
     checkout
    0.69
     hurdle
    0.67
     knocks
    0.66
    Act Density 0.052%

    No Known Activations