INDEX

Explanations

behavior

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Behaviors

-1.25

Behavioral

-1.09

behavioral

-1.09

 behaviors

-1.09

 Behavioral

-1.02

 Behavioural

-1.02

 AssemblyCulture

-1.00

 behaviours

-0.96

behaviors

-0.95

haviours

-0.93

POSITIVE LOGITS

ing

0.70

an

0.65

the

0.61

ed

0.55

 lining

0.51

er

0.49

пад

0.47

:");

0.43

Ass

0.43

 effect

0.43

Activations Density 0.232%