INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     моро
    -0.94
     Hearings
    -0.83
     mayonnaise
    -0.83
     lada
    -0.81
     Barbecue
    -0.80
     Obedience
    -0.80
     frosts
    -0.80
     dres
    -0.80
     emplace
    -0.79
    働く
    -0.79
    POSITIVE LOGITS
     bath
    2.27
    bath
    1.62
     baths
    1.48
     bathtub
    1.39
     relaxing
    1.38
     Bath
    1.32
     tub
    1.26
     Epsom
    1.25
     soak
    1.25
    🛀
    1.16
    Act Density 0.010%

    No Known Activations