INDEX

Explanations

agents or directors

exploit, abuse or endanger

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

0.38

Sph

0.37

흥

0.37

曜

0.36

├──

0.35

.*;

0.35

振

0.35

 mobs

0.33

習

0.33

 spherical

0.33

POSITIVE LOGITS

ড়ার

0.44

ড়া

0.44

өз

0.43

ড়া

0.42

 रात्री

0.40

钚

0.40

roi

0.39

য়েক

0.39

')

0.39

 através

0.37

Activations Density 0.063%