INDEX
    Explanations

    concepts related to morality and ethical considerations

    New Auto-Interp
    Negative Logits
    Ws
    -0.20
    WS
    -0.19
    WB
    -0.19
    WF
    -0.19
    W
    -0.19
    WL
    -0.19
    WM
    -0.19
    (W
    -0.18
    WP
    -0.18
    /W
    -0.18
    POSITIVE LOGITS
     width
    0.45
     wide
    0.42
    ;width
    0.41
     wealth
    0.40
     weakness
    0.39
     weak
    0.39
     widths
    0.39
     weekly
    0.38
     worst
    0.38
     widening
    0.37
    Act Density 0.252%

    No Known Activations