INDEX

Explanations

power and attributes

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

No Comments

Negative Logits

 powers

-0.22

 powered

-0.21

powered

-0.18

-powered

-0.18

 powering

-0.17

 Powered

-0.17

 Powers

-0.16

powers

-0.16

 PowerShell

-0.16

 PowerPoint

-0.16

POSITIVE LOGITS

ela

0.11

Authority

0.10

ã

0.10

 influence

0.09

 authority

0.09

 mighty

0.09

ajes

0.09

 Corruption

0.09

els

0.09

 attorney

0.09

Activations Density 0.168%