INDEX
    Explanations

    phrases indicating assurance or certainty about outcomes

    New Auto-Interp
    Negative Logits
    ild
    -0.15
    bay
    -0.14
    burgh
    -0.14
    ardo
    -0.14
    .reducer
    -0.14
    mailer
    -0.14
    ko
    -0.14
     éķ
    -0.14
    -scale
    -0.13
     tallest
    -0.13
    POSITIVE LOGITS
    ably
    0.23
    /prom
    0.19
    anteed
    0.19
    /request
    0.16
    ingly
    0.16
    antee
    0.15
     
    0.15
    ment
    0.15
    ÌĨ
    0.15
    ing
    0.14
    Act Density 0.027%

    No Known Activations