INDEX

Explanations

social circles or networks

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 comrades

-0.12

 teammate

-0.11

 Neighbor

-0.11

 fellow

-0.10

 neighbor

-0.10

 colleague

-0.10

glo

-0.09

omen

-0.09

bil

-0.09

 neighbour

-0.09

POSITIVE LOGITS

 circle

0.49

 circles

0.42

circle

0.37

 Circle

0.37

åľĪ

0.35

Circle

0.34

 networks

0.34

-circle

0.33

 network

0.33

_circle

0.28

Activations Density 0.109%