INDEX
    Explanations

    references to danger and medical conditions involving health risks

    New Auto-Interp
    Negative Logits
     *"
    -0.78
     AppComponent
    -0.74
     #"
    -0.74
    。"
    -0.71
    ).[
    -0.71
    ."
    -0.70
     [@
    -0.70
    )."
    -0.69
     ["
    -0.69
     ."
    -0.69
    POSITIVE LOGITS
    !
    1.16
    ?
    1.07
    !</
    0.83
    !-
    0.80
    !?
    0.79
    ?!
    0.79
    !');
    0.77
    ?-
    0.76
    -!
    0.75
    ?</
    0.73
    Act Density 0.022%

    No Known Activations