INDEX

Explanations

refusing harmful requests about groups

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 lorsque

0.43

。『

0.41

codile

0.40

 වීම

0.40

 ಮೇಲೆ

0.39

říklad

0.39

AuthConfig

0.39

甥

0.39

。(

0.38

ateľ

0.38

POSITIVE LOGITS

 nostrum

0.53

 pellets

0.49

 vials

0.49

 infusions

0.49

 puns

0.49

စာ

0.46

 skewers

0.46

 любых

0.44

 vignettes

0.44

 bitters

0.44

Activations Density 0.004%