INDEX
    Explanations

    expressions related to negative judgment or criticism

    derogatory remarks about intelligence

    New Auto-Interp
    Negative Logits
    AUT
    -0.92
     largeDownload
    -0.88
    APH
    -0.87
    quart
    -0.76
    cussion
    -0.69
    HI
    -0.69
    aver
    -0.67
    RH
    -0.67
    ILA
    -0.67
    soType
    -0.66
    POSITIVE LOGITS
     stupid
    1.01
    nesses
    0.93
     silly
    0.85
     Stupid
    0.84
     dumb
    0.83
    gery
    0.79
    itude
    0.77
    upid
    0.77
    ulent
    0.77
    ishly
    0.76
    Act Density 0.010%

    No Known Activations