INDEX

Explanations

references to specific technical notes or labels in the document

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 indeed

-0.75

Indeed

-0.74

 Indeed

-0.70

AxisAlignment

-0.68

 مشين

-0.68

VENTORY

-0.66

ită

-0.63

 EconPapers

-0.63

iecie

-0.63

 Accesat

-0.60

POSITIVE LOGITS

NOTE

2.34

 NOTE

2.13

WARNING

1.70

 WARNING

1.38

IMPORTANT

1.23

CAUTION

1.01

 TODO

0.97

 IMPORTANT

0.94

 CAUTION

0.85

TODO

0.83

Activations Density 0.003%