INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rub
    -1.62
    Rub
    -1.41
     Rub
    -1.30
    rub
    -1.27
     RUB
    -1.21
     rubbed
    -1.18
     rubbing
    -1.16
     rubs
    -1.13
    RUB
    -1.08
    HasAnnotation
    -0.84
    POSITIVE LOGITS
    s
    0.86
    bers
    0.60
    bles
    0.59
    bing
    0.59
    sn
    0.58
    bish
    0.58
    sin
    0.57
    sack
    0.56
    sid
    0.55
    sy
    0.55
    Act Density 0.194%

    No Known Activations