INDEX
    Explanations

    concepts related to moral and ethical dilemmas

    New Auto-Interp
    Negative Logits
    intro
    -0.19
     introduction
    -0.17
    836
    -0.17
    akov
    -0.15
     introducing
    -0.15
    olas
    -0.15
    roman
    -0.15
    é¼ĵ
    -0.14
    835
    -0.14
    dit
    -0.14
    POSITIVE LOGITS
     others
    0.21
     someone
    0.20
     somebody
    0.20
     anybody
    0.20
     otherwise
    0.20
     unintention
    0.20
     anyone
    0.20
     everybody
    0.20
     everyone
    0.19
    someone
    0.19
    Act Density 0.028%

    No Known Activations