INDEX

Explanations

phrases expressing reluctance or refusal

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

hw

-0.07

Ð¼Ð°Ð·

-0.07

.fun

-0.06

etsk

-0.06

ITT

-0.06

 Compass

-0.06

á»³

-0.06

bjerg

-0.06

à¹īà¸ĩ

-0.06

POSITIVE LOGITS

 thank

0.15

thank

0.12

Thank

0.12

 Thank

0.12

è°¢

0.10

 THANK

0.09

 thanks

0.08

thanks

0.08

 Thanks

0.08

pref

0.08

Activations Density 0.065%