INDEX
Explanations
phrases that indicate a potential risk or threat
New Auto-Interp
Negative Logits
UserScript
-0.85
Datuak
-0.81
Ivo
-0.77
ensement
-0.72
rfloor
-0.71
Millisecond
-0.70
ddelweddau
-0.70
Verdun
-0.68
cherchés
-0.68
CONSIN
-0.66
POSITIVE LOGITS
pose
3.17
Pose
2.94
poses
2.93
posed
2.88
posing
2.74
Pose
2.62
pose
2.40
poses
1.79
POSE
1.67
posed
1.67
Activations Density 0.089%