INDEX

Explanations

pronoun followed by action verb

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 हाद

0.69

cand

0.69

忐

0.66

 milij

0.66

 besø

0.66

敏感

0.66

 malaise

0.66

葺

0.64

 honeymoon

0.64

快適

0.64

POSITIVE LOGITS

 lung

1.33

 roared

1.16

 unleashing

1.08

 roar

1.02

 hurled

1.00

 screamed

1.00

Lung

0.96

 roaring

0.96

 shouted

0.96

 snar

0.95

Activations Density 0.132%