INDEX
    Explanations

    phrases indicating comparison or contrast in various contexts

    New Auto-Interp
    Negative Logits
    ulp
    -0.17
    èĴĤ
    -0.15
    istics
    -0.15
    amba
    -0.15
     cÃłng
    -0.14
     Anthem
    -0.14
    ucht
    -0.14
    itals
    -0.13
    .weixin
    -0.13
     kako
    -0.13
    POSITIVE LOGITS
     happened
    0.29
     happens
    0.24
     occurred
    0.23
     happ
    0.23
     during
    0.22
     with
    0.22
     occurs
    0.22
     happen
    0.21
     in
    0.21
     done
    0.20
    Act Density 0.143%

    No Known Activations