INDEX

Explanations

Spanish "se" constructions

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ses

-0.17

rador

-0.12

achusetts

-0.11

odore

-0.11

pired

-0.11

woke

-0.11

teenth

-0.11

ductive

-0.11

levard

-0.11

wealth

-0.10

POSITIVE LOGITS

oret

0.22

orem

0.22

ories

0.20

owing

0.19

existent

0.17

oretical

0.16

linear

0.16

teen

0.16

neath

0.16

iquement

0.16

Activations Density 0.284%