INDEX
Explanations
content that is classified as offensive or contains warnings about sensitive material
content warnings and disclaimers about offensive, inappropriate, or restricted material.
New Auto-Interp
Negative Logits
DockStyle
-0.33
cherchez
-0.30
bingung
-0.29
γνω
-0.29
GUILayout
-0.29
battre
-0.29
hubung
-0.28
envy
-0.26
dogged
-0.26
说明
-0.26
POSITIVE LOGITS
sensitive
1.02
objectionable
1.01
offensive
1.01
censored
0.99
offensive
0.99
sensitive
0.96
inappropriate
0.96
Offensive
0.94
ensitive
0.93
Sensitive
0.92
Activations Density 0.427%