INDEX

Explanations

motivation and signals

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 incentive

-2.23

 incentives

-2.22

 Incentive

-1.98

 Incentives

-1.97

Incenti

-1.80

incenti

-1.77

 incentiv

-1.48

 incenti

-1.45

 reward

-1.07

 Incenti

-1.07

POSITIVE LOGITS

ClientSize

0.57

};*/

0.54

WebVitals

0.53

Попис

0.49

contentLoaded

0.47

')));

0.47

*~*~

0.46

everywhere

0.46

 createSprite

0.46

 TestCase

0.45

Activations Density 0.117%