INDEX

Explanations

tro followed by ca, isi, ve, odos, ppo

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Pai

-0.11

rol

-0.10

 Keeper

-0.10

otic

-0.10

pell

-0.09

zioni

-0.09

ively

-0.09

ziej

-0.09

Rol

-0.09

ãĥ¼ãĥĨãĤ£

-0.09

POSITIVE LOGITS

tro

0.17

Tro

0.16

Tro

0.15

ika

0.14

UBLE

0.12

ppo

0.12

pe

0.10

 chuyá»ĩn

0.10

elfth

0.10

adero

0.10

Activations Density 0.023%