INDEX
    Explanations

    words related to negative feedback or disapproval

    terms related to critique or disapproval

    New Auto-Interp
    Negative Logits
    cise
    -0.70
    frey
    -0.66
    tre
    -0.64
    eret
    -0.63
    ovember
    -0.60
    pared
    -0.60
     stocking
    -0.60
    tein
    -0.60
    coat
    -0.59
    ipeg
    -0.59
    POSITIVE LOGITS
     criticism
    0.95
    代
    0.93
     critic
    0.91
     critics
    0.86
     criticisms
    0.86
     leveled
    0.83
    arial
    0.82
     critiques
    0.80
    naires
    0.77
    é¾įå¥ij士
    0.76
    Act Density 0.019%

    No Known Activations