INDEX

Explanations

words and phrases that convey contrasts between positive and negative experiences

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ãģĦãĤĭ

-0.08

å¾Ĵ

-0.08

 nues

-0.07

Î»Ïİ

-0.07

utsch

-0.07

Ð°Ð»Ñİ

-0.07

issy

-0.07

podob

-0.07

ossa

-0.07

bÃŃr

-0.07

POSITIVE LOGITS

antages

0.08

otto

0.06

ara

0.06

(es

0.06

å¤§åĪ©

0.06

 ride

0.06

undred

0.06

ru

0.06

Lia

0.05

aware

0.05

Activations Density 0.003%