INDEX

Explanations

societal norms and power structures

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 практика

0.44

 buena

0.43

 subscript

0.43

 aprobación

0.42

работка

0.42

 klima

0.42

鐫

0.41

%/

0.41

追い

0.41

 buenas

0.41

POSITIVE LOGITS

Primitive

0.37

Jer

0.37

 traversing

0.36

Faz

0.35

 savag

0.35

Fan

0.35

ńst

0.34

Bou

0.34

reflection

0.34

becomes

0.34

Activations Density 0.000%