INDEX

Explanations

terms related to sensitivity and sensitive conditions or topics

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

dden

-0.09

illard

-0.08

anh

-0.08

oded

-0.08

atee

-0.07

 Äĳáº¡i

-0.07

_BAND

-0.07

imizer

-0.07

amate

-0.07

ongan

-0.07

POSITIVE LOGITS

 sensitive

0.09

-sensitive

0.09

 sensitivity

0.08

Sensitive

0.07

 toward

0.06

ensitive

0.06

 regard

0.06

 touch

0.06

 sens

0.06

ires

0.06

Activations Density 0.013%