INDEX

Explanations

expressions related to honesty and truthfulness

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

olk

-0.08

è£ķ

-0.07

erland

-0.07

isko

-0.07

å¢

-0.07

ÙĬÙĥÙĬ

-0.07

olio

-0.07

 norge

-0.07

ÐµÑĢÐº

-0.06

xab

-0.06

POSITIVE LOGITS

 itself

0.06

wing

0.06

sin

0.06

 Ronald

0.06

Sin

0.06

 pole

0.06

iaux

0.05

 ongoing

0.05

gency

0.05

Dy

0.05

Activations Density 0.000%