INDEX

Explanations

control of / over / us / by

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_controls

-0.10

ying

-0.09

ombo

-0.09

çĵľ

-0.09

ed

-0.09

xlim

-0.09

ecut

-0.09

 Fleming

-0.09

ALLY

-0.09

ials

-0.09

POSITIVE LOGITS

 freak

0.18

.Monad

0.18

 Freak

0.17

ateral

0.16

-Allow

0.15

led

0.15

lee

0.13

eer

0.13

ador

0.13

adores

0.13

Activations Density 0.016%