INDEX

Explanations

preventing victim defense

structured, emphasized tokens such as headings/labels, quoted or apostrophized text, and numerals/units that mark key or enumerated information.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ಸ್

0.40



0.39

和

0.39

<0xE7>

0.39

्रेडिट

0.37

್

0.37

菓

0.37

тке

0.37

ере

0.36

、

0.35

POSITIVE LOGITS

POC

0.35

 Ideal

0.33

 tomto

0.32

 Dream

0.32

 Labrenzia

0.32

offsetLeft

0.32

 نئے

0.32

 vorge

0.32

 ہوا

0.31

 Quantum

0.31

Activations Density 0.220%