INDEX
    Explanations

    phrases related to errors and mistakes

    mentions of errors and issues with performance or accuracy

    New Auto-Interp
    Negative Logits
    electric
    -0.78
    tsky
    -0.78
    apeake
    -0.77
    apy
    -0.76
    amen
    -0.75
    nai
    -0.75
    estine
    -0.74
    edom
    -0.73
    bledon
    -0.72
    rons
    -0.70
    POSITIVE LOGITS
    ously
    0.85
    gered
    0.83
     guiActiveUn
    0.82
     margin
    0.79
    uracy
    0.78
     error
    0.75
     deceive
    0.71
     prone
    0.70
     dece
    0.69
     errors
    0.68
    Act Density 0.022%

    No Known Activations