INDEX
    Explanations

    negative judgments or moral evaluations regarding actions and situations

    New Auto-Interp
    Negative Logits
    warm
    -0.17
    ãģĭãĤı
    -0.15
    Warm
    -0.15
    اÙĨÙĪ
    -0.15
    wine
    -0.15
    loub
    -0.14
    vrd
    -0.14
    .GroupLayout
    -0.14
    omanip
    -0.14
    abbr
    -0.14
    POSITIVE LOGITS
    fully
    0.25
    fulness
    0.21
    s
    0.19
    itude
    0.18
    wrong
    0.17
    /error
    0.17
    誤
    0.17
    fu
    0.16
    ful
    0.16
    omers
    0.16
    Act Density 0.017%

    No Known Activations