INDEX
    Explanations

    constructs that indicate moral or ethical dilemmas

    New Auto-Interp
    Negative Logits
    irsch
    -0.19
    unga
    -0.15
    ults
    -0.14
     ff
    -0.14
     fe
    -0.14
     AM
    -0.13
    ?type
    -0.13
    '
    -0.13
     Granny
    -0.13
    ackBar
    -0.13
    POSITIVE LOGITS
    анк
    0.16
    essen
    0.15
    itler
    0.15
    ì§ĵ
    0.14
    каÑģ
    0.14
    è¬
    0.14
    Coding
    0.13
    :convert
    0.13
    ÙħÙĨد
    0.13
    åĩĢ
    0.13
    Act Density 0.722%

    No Known Activations