INDEX

Explanations

fraud and violation testing

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

읽

1.32

迭代

1.28

 entwickeln

1.24

 kroz

1.21

 wygod

1.19

갖

1.17

 međ

1.16

께

1.15

 schaffen

1.15

 그래프

1.15

POSITIVE LOGITS

 perpetrators

1.83

 looting

1.72

 perpetrated

1.68

 atrocities

1.60

 heinous

1.59

 perpetrator

1.57

 assaults

1.57

 crimes

1.57

 assault

1.56

 criminals

1.53

Activations Density 3.711%