INDEX

Explanations

pronouns referring to a person

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

for

0.51

我們

0.48

 forgery

0.47

我們要

0.47

我们就

0.47

ି

0.45

 ஒன்றை

0.45

কিছু

0.44

 আমরা

0.44

ோம்

0.44

POSITIVE LOGITS

 wrote

0.49

 flew

0.46

ffes

0.46

 texted

0.45

 could

0.44

 wore

0.44

mming

0.44

can

0.44

泂

0.43

czy

0.43

Activations Density 0.000%