INDEX

Explanations

whatMethod used: 1Reason: MAX_ACTIVATING_TOKENS are all the same token

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Did

0.52

Did

0.49

Do

0.49

Do

0.47

Does

0.46

에서는

0.45

에서도

0.45

로는

0.45

では

0.44

 Does

0.44

POSITIVE LOGITS

 constitutes

1.19

 happens

1.09

 kind

1.05

 happened

1.05

 transpired

0.94

 motivates

0.92

 kinds

0.82

 resonates

0.81

 constituye

0.80

 excites

0.79

Activations Density 0.268%