INDEX
    Explanations

    phrases related to danger or potential harm

    New Auto-Interp
    Negative Logits
    iao
    -0.16
    agues
    -0.16
    ares
    -0.15
    arez
    -0.15
    .sharedInstance
    -0.15
    ero
    -0.15
    ignon
    -0.14
    amin
    -0.14
    -called
    -0.14
    ÐŁÐļ
    -0.14
    POSITIVE LOGITS
     danger
    0.17
     dangers
    0.17
    lessly
    0.17
    çĬ¶
    0.17
    baar
    0.16
    elm
    0.16
    weigh
    0.16
    jsp
    0.15
    ĺ
    0.14
    ources
    0.14
    Act Density 0.034%

    No Known Activations