INDEX

Explanations

references to adversarial or opposing forces

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_nsec

-0.07

lov

-0.07

ally

-0.07

orig

-0.07

ifestyles

-0.07

asia

-0.07

azo

-0.07

gere

-0.07

cher

-0.07

enge

-0.07

POSITIVE LOGITS

liness

0.08

ess

0.07

rous

0.07

ARRANT

0.07

/host

0.07

/op

0.06

ship

0.06

 ship

0.06

hood

0.06

 ships

0.06

Activations Density 0.009%