INDEX
    Explanations

    adjectives or phrases describing potential harm or risks

    references to danger and potentially harmful situations

    New Auto-Interp
    Negative Logits
    olitan
    -0.82
    roma
    -0.79
    ļéĨĴ
    -0.78
    mination
    -0.76
    via
    -0.75
    oration
    -0.75
    zzo
    -0.75
    hew
    -0.74
    arthed
    -0.74
    cedented
    -0.73
    POSITIVE LOGITS
     adolesc
    0.91
     endanger
    0.81
     undermin
    0.75
     sounding
    0.74
    nesses
    0.73
     dangerous
    0.73
     threats
    0.71
     overdose
    0.71
     combination
    0.70
     Danger
    0.69
    Act Density 0.032%

    No Known Activations