INDEX
    Explanations

    phrases related to issues of safety and risk management

    avoiding undesirable outcomes

    New Auto-Interp
    Negative Logits
     CreateTagHelper
    -0.69
     مرئيه
    -0.68
     насељу
    -0.63
    出版年
    -0.63
    +#+#
    -0.61
     للمعارف
    -0.58
     Préférences
    -0.56
    }}^
    -0.54
     Вікі
    -0.54
    KommentareTeilen
    -0.50
    POSITIVE LOGITS
     avoid
    0.56
     avoided
    0.55
     avoids
    0.54
     avoiding
    0.54
    Avoid
    0.50
    Avoiding
    0.49
     Avoid
    0.49
    avoid
    0.48
     AVOID
    0.44
     Avoiding
    0.43
    Act Density 0.061%

    No Known Activations