INDEX

Explanations

extraction process

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 purified

-0.13

Dol

-0.12

dol

-0.11

peg

-0.10

 excess

-0.10

 Fritz

-0.10

sut

-0.10

dol

-0.09

 Leap

-0.09

 noreferrer

-0.09

POSITIVE LOGITS

æıĲ

0.18

 extraction

0.18

_extraction

0.16

 extract

0.15

 crushing

0.14

Sox

0.14

 Extraction

0.14

 solvent

0.13

 extracting

0.13

 extractor

0.13

Activations Density 0.039%