INDEX

Explanations

phrases related to refusal or avoidance in communication

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ordo

-0.08

tero

-0.07

atat

-0.07

atos

-0.07

aggi

-0.07

ibold

-0.07

lew

-0.07

olle

-0.06

lop

-0.06

ipsoid

-0.06

POSITIVE LOGITS

 answer

0.08

 comment

0.07

exact

0.07

 dwell

0.07

 Ø³Ú©

0.07

 exact

0.07

answer

0.07

 specifics

0.07

 concrete

0.06

æĺ¯åĲ¦

0.06

Activations Density 0.010%