INDEX

Explanations

Chinese health queries and JSON keywords

sentences where the assistant asserts it's a safe/helpful AI and refuses or explains why it cannot comply (safety/ refusal boilerplate).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 pumpkin

0.54

 PACKAGE

0.50

ERSHIP

0.50

ាត់

0.50

ే

0.48

 crispy

0.48

 avoidable

0.48

ach

0.46

inars

0.46

اك

0.46

POSITIVE LOGITS

0.57

ک

0.57

gpt

0.56

Convers

0.55

ஜ

0.54

María

0.52

Robot

0.52

mig

0.52

openai

0.52

Activations Density 2.839%