INDEX
    Explanations

    terms related to personal suffering or discomfort

    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.07
    2:0.08
    3:0.09
    4:0.07
    5:0.07
    6:0.07
    7:0.07
    8:0.09
    9:0.09
    10:0.08
    11:0.06
    Negative Logits
    ewitness
    -2.38
    renheit
    -2.25
    omach
    -2.23
    arted
    -2.17
    irteen
    -2.16
    aniel
    -2.14
    untarily
    -2.12
    ーティ
    -2.11
    resy
    -2.11
    emort
    -2.09
    POSITIVE LOGITS
    crop
    2.07
    clus
    2.04
     incentive
    1.98
     BLM
    1.97
     bip
    1.97
     contribut
    1.95
     combo
    1.95
     bloc
    1.95
     AB
    1.94
     divest
    1.94
    Act Density 0.000%

    No Known Activations