INDEX
    Explanations

    phrases that indicate success or guarantees in various contexts

    New Auto-Interp
    Negative Logits
    kir
    -0.14
    ogo
    -0.14
    uch
    -0.14
     Gä
    -0.14
    orld
    -0.13
    uli
    -0.13
    lse
    -0.13
    sdale
    -0.13
    kv
    -0.13
    288
    -0.13
    POSITIVE LOGITS
    /full
    0.18
    edly
    0.18
     pure
    0.17
    ?url
    0.16
     Pure
    0.16
     fled
    0.15
     accurate
    0.14
    -addons
    0.14
    -ajax
    0.14
    chedulers
    0.14
    Act Density 0.030%

    No Known Activations