INDEX
    Explanations

    phrases indicating comparison or contrast

    New Auto-Interp
    Negative Logits
    rose
    -0.17
    one
    -0.14
    ÅĻÃŃzenÃŃ
    -0.14
    INET
    -0.13
    neau
    -0.13
    ernet
    -0.13
     eher
    -0.13
    noon
    -0.13
     Girlfriend
    -0.13
    PI
    -0.13
    POSITIVE LOGITS
    iating
    0.21
    iates
    0.20
    iator
    0.20
    aland
    0.19
    /div
    0.18
     between
    0.18
    iable
    0.17
    iability
    0.17
    iators
    0.16
    nowrap
    0.15
    Act Density 0.070%

    No Known Activations