INDEX

Explanations

phrases related to reasons or justifications

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

à¸£à¸¡

-0.07

¢°

-0.07

ãĤ¥

-0.07

dens

-0.06

Ä±sÄ±

-0.06

ari

-0.06

uhn

-0.06

udit

-0.06

aiser

-0.06

Tower

-0.06

POSITIVE LOGITS

 reason

0.10

criptor

0.08

 reasons

0.08

 apparent

0.07

 Reason

0.07

.reason

0.07

_reason

0.07

reason

0.07

iced

0.07

 Reasons

0.06

Activations Density 0.005%