INDEX

Explanations

model response generation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ferocious

0.99

 shri

0.91

 terrified

0.89

 violently

0.88

 terrifying

0.88

 sobbing

0.87

 footsteps

0.87

큰

0.86

 bolts

0.85

 feelings

0.82

POSITIVE LOGITS

：

1.11

मध्ये

1.02

：（

1.01

美国

1.00

美國

0.99

मधील

0.98

 آمریکا

0.98

)：

0.97

umā

0.97

）：

0.95

Activations Density 0.008%