INDEX

Explanations

sexual harassment and assault

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 limpi

-0.91

を抱

-0.91

introdu

-0.89

 sospe

-0.82

 blatant

-0.81

严重的

-0.81

 prevented

-0.81

 prevention

-0.81

introduce

-0.80

 mitigating

-0.79

POSITIVE LOGITS

 attack

2.02

 targeting

1.96

 attacking

1.95

attack

1.87

 target

1.80

 prey

1.72

 Targeting

1.70

targeting

1.66

 attacks

1.63

 attacked

1.60

Activations Density 0.116%