INDEX

Explanations

that is mainly from

instructions and meta-discussions about AI models, their capabilities or constraints, especially jailbreak-style prompts and references to policies or system rules.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ⵛ

0.35

┍

0.34

يف

0.33

胍

0.33

ლე

0.33

？](

0.32

␥

0.32

ﺔ

0.32

άλ

0.32

ί

0.32

POSITIVE LOGITS

 that

0.37

 advertisement

0.36

 providence

0.35

 Youtube

0.34

not

0.32

 customer

0.32

 vision

0.32

rod

0.32

 time

0.31

It

0.31

Activations Density 3.103%