INDEX
    Explanations

    words related to harboring, protection, and ambivalence towards responsibility or wrongdoing

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.07
    3:0.07
    4:0.14
    5:0.03
    6:0.05
    7:0.36
    8:0.04
    9:0.04
    10:0.05
    11:0.06
    Negative Logits
    =>
    -1.38
     disappear
    -1.36
    imil
    -1.33
    Enlarge
    -1.32
    budget
    -1.32
    gey
    -1.31
    availability
    -1.30
     chopping
    -1.30
    merce
    -1.30
    ifully
    -1.30
    POSITIVE LOGITS
     emotions
    1.62
     optimism
    1.61
     doubts
    1.53
     feelings
    1.52
     pent
    1.52
     sidx
    1.51
    1.46
     emotion
    1.45
     spoilers
    1.45
    essim
    1.44
    Act Density 0.001%

    No Known Activations