INDEX

Explanations

She spends

Detects when the model/assistant is producing a long, structured response—activating on tokens that mark assistant-generated content (introductions, headings, list or reply-openers).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

the

0.94

0.70

the

0.69

or

0.68

de

0.66

0.65

<0x99>

0.58

to

0.57

<0x98>

0.57

POSITIVE LOGITS

𝟬

0.66

are

0.66

()=>{

0.66

년간

0.65

৯

0.65

৮

0.64

 ہیں۔

0.63

 سيكون

0.63

৭

0.62

٠

0.61

Activations Density 6.869%