INDEX
    Explanations

    phrases indicating inability or negation, particularly involving the word "not."

    New Auto-Interp
    Negative Logits
    Références
    -0.39
     inicios
    -0.39
    tênis
    -0.39
     gonna
    -0.39
    belongs
    -0.38
    łóż
    -0.37
    dealloc
    -0.37
    -0.37
    subpackage
    -0.36
     harms
    -0.36
    POSITIVE LOGITS
     could
    0.73
     Could
    0.69
    Could
    0.67
     couldnt
    0.64
    could
    0.64
     Couldn
    0.62
     kunde
    0.58
     couldn
    0.57
    Couldn
    0.56
     COULD
    0.56
    Act Density 0.015%

    No Known Activations