INDEX
    Explanations

    phrases that indicate a potential risk or threat

    New Auto-Interp
    Negative Logits
    UserScript
    -0.85
    Datuak
    -0.81
    Ivo
    -0.77
    ensement
    -0.72
    rfloor
    -0.71
    Millisecond
    -0.70
     ddelweddau
    -0.70
     Verdun
    -0.68
     cherchés
    -0.68
    CONSIN
    -0.66
    POSITIVE LOGITS
     pose
    3.17
     Pose
    2.94
     poses
    2.93
     posed
    2.88
     posing
    2.74
    Pose
    2.62
    pose
    2.40
    poses
    1.79
    POSE
    1.67
    posed
    1.67
    Act Density 0.089%

    No Known Activations