INDEX

Explanations

your you

Detects assistant safety/crisis-response language — first‑person refusals, disclaimers, empathy statements, and offers of help or crisis resources.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 handles

0.45

 chunks

0.44

 collections

0.42

 parity

0.42

 substantial

0.40

 parsing

0.39

 samples

0.39

con

0.39

 exhaust

0.39

POSITIVE LOGITS

あなたの

0.88

your

0.82

 당신

0.82

Your

0.79

你的

0.79

 あなた

0.76

あなたが

0.75

 Você

0.74

 вашей

0.74

あなた

0.73

Activations Density 0.395%