INDEX

Explanations

offering alternatives or disclaimers

Instances of the assistant refusing a request or invoking safety/policy guidance (safety-justification and resource-offer language).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 rythme

0.35

 each

0.34

パターン

0.33

 Each

0.33

)，

0.32

 maxSize

0.32

Iterations

0.31

 rhythm

0.31

帧

0.31

 scal

0.30

POSITIVE LOGITS

 alternativas

0.79

 alternatives

0.76

 Alternatives

0.69

Alternatives

0.68

代わりに

0.66

 alternative

0.66

 alternativa

0.64

替代

0.63

 альтерна

0.62

 instead

0.60

Activations Density 0.598%