INDEX
Explanations
terms related to legal violations
phrases indicating legal violations or infringements
New Auto-Interp
Negative Logits
ritz
-0.75
onna
-0.73
ppa
-0.72
retty
-0.69
dwind
-0.65
thanking
-0.64
borgh
-0.64
achine
-0.63
agne
-0.62
bang
-0.62
POSITIVE LOGITS
Ö¼
0.91
norms
0.82
NRS
0.77
instr
0.76
Contracts
0.75
orius
0.75
hibited
0.73
procedural
0.72
CLS
0.70
ãģį
0.70
Activations Density 0.098%