INDEX
    Explanations

    actions and accusations related to blame and responsibility in societal and personal contexts

    New Auto-Interp
    Negative Logits
    ropp
    -0.17
    ifold
    -0.17
    dden
    -0.17
    esso
    -0.17
    anova
    -0.15
    iloc
    -0.15
    ivre
    -0.15
    icana
    -0.15
    ego
    -0.14
    lical
    -0.14
    POSITIVE LOGITS
     somehow
    0.16
     alt
    0.16
     whenever
    0.15
    ãĥ³ãĥĨãĤ£
    0.15
    ela
    0.15
    Named
    0.14
    elta
    0.14
    longleftrightarrow
    0.14
    .Microsoft
    0.13
     bour
    0.13
    Act Density 0.384%

    No Known Activations