INDEX
    Explanations

    words related to incorrectness or errors

    instances of the word "wrong" and its variations, indicating errors or failures

    New Auto-Interp
    Negative Logits
    enance
    -0.73
    ILA
    -0.72
     Ri
    -0.67
     Swim
    -0.66
     Flavoring
    -0.64
    tsky
    -0.62
     Chill
    -0.60
     CARE
    -0.60
     Colbert
    -0.59
    kamp
    -0.59
    POSITIVE LOGITS
    headed
    1.40
    fully
    1.39
    doing
    1.04
    do
    1.03
    fulness
    0.98
    ful
    0.91
    footed
    0.90
    sight
    0.89
    behavior
    0.89
    dest
    0.89
    Act Density 0.029%

    No Known Activations