INDEX

Explanations

detrimental effects, therapeutic advice, phishing simulations

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

物

0.43

 hurt

0.40

 hurting

0.39

踹

0.36

 tất

0.36

ড়িয়ে

0.36

ிருந்த

0.36

 خارجية

0.35

 isother

0.35

&$\

0.35

POSITIVE LOGITS

ওসি

0.43

 Effects

0.42

 அவர்களின்

0.41

Effects

0.40

effects

0.39

 Bahan

0.38

活动

0.38

 библиотека

0.38

 sayfası

0.38

FOLD

0.37

Activations Density 0.000%