INDEX

Explanations

voluntary, spontaneous, unsolicited

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Impact

-0.93

tigem

-0.90

–

-0.88

Cia

-0.88

 honoring

-0.83

 permit

-0.81

 impact

-0.81

＾＾；

-0.81

 има

-0.80

 reserving

-0.80

POSITIVE LOGITS

 voluntarily

1.57

 voluntary

1.49

oluntary

1.22

 Voluntary

1.18

 self

1.16

 espont

1.14

 unsolicited

1.11

 spontaneous

1.10

 spontaneously

1.02

 volontaire

1.02

Activations Density 0.056%