INDEX

Explanations

nothing, Not, nowhere, nada, rien

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

aro

-0.10

abol

-0.09

okers

-0.08

inker

-0.08

wsz

-0.08

isha

-0.08

ousse

-0.08

uti

-0.08

nage

-0.08

oker

-0.08

POSITIVE LOGITS

 nothing

0.94

 Nothing

0.77

nothing

0.76

 NOTHING

0.72

Nothing

0.70

 nichts

0.64

 nada

0.62

 rien

0.56

 Ð½Ð¸ÑĩÐµÐ³Ð¾

0.54

 nulla

0.39

Activations Density 0.211%