INDEX

Explanations

risk and high concepts

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 fucking

0.36

 elevates

0.33

aka

0.33

 others

0.32

 دی۔

0.31

 aware

0.31

~,

0.31

 source

0.31

 just

0.31

POSITIVE LOGITS

 Serviço

0.39

hattim

0.39

 大阪

0.37

囂

0.36

 ಹೇಳ

0.36

-_-

0.36

FROM

0.36

 சமீ

0.35

篒

0.35

 HAVING

0.34

Activations Density 0.022%