INDEX
    Explanations

    phrases related to ensuring compliance and safety

    New Auto-Interp
    Negative Logits
    NetMessage
    -0.89
    pmwiki
    -0.84
    ibble
    -0.77
     speculate
    -0.71
    onnaissance
    -0.71
    ulia
    -0.71
     Difficulty
    -0.65
     Finder
    -0.65
     Wonders
    -0.64
    rys
    -0.64
    POSITIVE LOGITS
     properly
    1.28
     adequately
    1.19
     safe
    1.11
     appropriately
    1.05
     compliant
    1.03
     not
    1.02
     complying
    1.01
     respectful
    0.98
     correctly
    0.96
     sufficiently
    0.93
    Act Density 0.173%

    No Known Activations