INDEX
    Explanations

    phrases related to danger and warning

    references to danger or harmfulness

    New Auto-Interp
    Negative Logits
    via
    -0.83
    rix
    -0.82
    elle
    -0.75
    ILA
    -0.75
    ļéĨĴ
    -0.73
    ARCH
    -0.72
    angular
    -0.72
    roma
    -0.72
    ann
    -0.72
    arger
    -0.71
    POSITIVE LOGITS
     dangerous
    1.11
     undermin
    1.02
     endanger
    1.00
     adolesc
    0.91
    danger
    0.89
     danger
    0.87
     hazardous
    0.85
     mosqu
    0.84
     dangers
    0.83
     deadly
    0.80
    Act Density 0.015%

    No Known Activations