INDEX
    Explanations

    references to moral and ethical concepts

    New Auto-Interp
    Negative Logits
    elong
    -0.18
    getMethod
    -0.16
    elan
    -0.15
    thinkable
    -0.15
    elles
    -0.15
    esis
    -0.14
    اÙĪØ±ÛĮ
    -0.14
    el
    -0.14
    enheim
    -0.14
    аÑĤив
    -0.14
    POSITIVE LOGITS
    izing
    0.21
     fiber
    0.19
    Mor
    0.19
     fibre
    0.18
    lez
    0.18
     Mor
    0.18
     Moral
    0.17
    izin
    0.17
    ized
    0.17
    istic
    0.17
    Act Density 0.015%

    No Known Activations