INDEX
    Explanations

    explicit meta-instructions about how to respond, especially prohibitions, formatting requirements, and directives to list items or provide examples.

    New Auto-Interp
    Negative Logits
     Wildlife
    0.37
     líquidos
    0.36
     an
    0.34
    基于
    0.33
     Bone
    0.32
     Plastics
    0.32
     ক্যান্সার
    0.32
     Bikini
    0.31
     Skin
    0.31
     Muscle
    0.31
    POSITIVE LOGITS
     גם
    0.45
    meno
    0.45
     murderous
    0.43
     ezek
    0.42
    0.42
     هذا
    0.41
     nonchal
    0.41
    0.41
     disdain
    0.41
     concom
    0.40
    Act Density 0.397%

    No Known Activations