INDEX
    Explanations

    words related to negative evaluation or disapproval

    instances of the word "criticism."

    New Auto-Interp
    Negative Logits
    tre
    -0.72
    cise
    -0.70
    locked
    -0.70
    elt
    -0.69
    vol
    -0.66
    gin
    -0.65
    cop
    -0.65
    NAS
    -0.65
    xon
    -0.65
    changing
    -0.64
    POSITIVE LOGITS
     criticism
    1.14
     critic
    0.94
    imaru
    0.93
     criticisms
    0.93
    代
    0.91
     critics
    0.89
    icism
    0.89
     reviewers
    0.85
     critiques
    0.84
     critique
    0.84
    Act Density 0.013%

    No Known Activations