INDEX
    Explanations

    actions related to supporting, helping, or harming others

    actions related to helping, harming, and the ethical implications of those actions

    New Auto-Interp
    Negative Logits
    iHUD
    -0.77
    :]
    -0.77
    .}
    -0.76
     thereof
    -0.73
     hers
    -0.70
     guiActiveUnfocused
    -0.69
    ................................................................
    -0.67
    ruff
    -0.65
    .","
    -0.64
    ItemThumbnailImage
    -0.63
    POSITIVE LOGITS
     unsuspecting
    1.07
     strangers
    0.99
     hordes
    0.96
     peoples
    0.94
     passers
    0.94
     politicians
    0.93
     opponents
    0.92
     enemies
    0.88
     clients
    0.88
     celebrities
    0.86
    Act Density 0.654%

    No Known Activations