INDEX
    Explanations

    legal and ethical constraints regarding actions and behaviors

    New Auto-Interp
    Negative Logits
    ogen
    -0.15
    chos
    -0.15
    lett
    -0.15
     Intercept
    -0.15
    å¾Ĵ
    -0.15
    oggler
    -0.14
     alphabet
    -0.14
    iser
    -0.14
     geçir
    -0.14
     Sie
    -0.14
    POSITIVE LOGITS
     illegal
    0.39
    illegal
    0.35
     against
    0.31
     Illegal
    0.30
     prohibited
    0.29
     frowned
    0.29
    Illegal
    0.28
     grounds
    0.28
    against
    0.28
     forbidden
    0.26
    Act Density 0.221%

    No Known Activations