INDEX

Explanations

knife attacks and violence

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

בש

-0.76

GM

-0.69

ään

-0.67

POSURE

-0.66

腾讯

-0.66

 brilliant

-0.65

烹饪

-0.65

 rati

-0.65

 coronation

-0.64

GM

-0.64

POSITIVE LOGITS

 knife

1.27

knife

1.10

 attacker

1.09

kn

1.07

Knife

1.05

 beheaded

1.05

 stabbing

1.00

 knives

0.94

 Knife

0.88

kni

0.87

Activations Density 0.022%