INDEX
    Explanations

    statements related to moral and ethical principles

    New Auto-Interp
    Negative Logits
    лаÑĪ
    -0.16
     Worldwide
    -0.15
    alars
    -0.15
    ÅĤu
    -0.15
    altar
    -0.15
    serter
    -0.14
    å½
    -0.14
    ervers
    -0.14
    ulong
    -0.14
    zion
    -0.14
    POSITIVE LOGITS
     Nack
    0.17
     nor
    0.16
    itis
    0.15
    JOR
    0.15
    oader
    0.15
    åĽ£
    0.14
    489
    0.14
     Nor
    0.14
     SOCK
    0.14
    ronic
    0.13
    Act Density 0.248%

    No Known Activations