INDEX

Explanations

references to personal safety and threats

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

è¥

-0.07

adar

-0.07

CHAT

-0.06

ÑĳÑĢ

-0.06

ìĹ¼

-0.06

asic

-0.06

 ascent

-0.06

à¸Ńà¸«

-0.06

okus

-0.06

itech

-0.06

POSITIVE LOGITS

 safety

0.14

 Safety

0.13

 protection

0.13

Safety

0.12

 Protection

0.12

 threats

0.11

-threat

0.11

å®īåħ¨

0.11

 security

0.11

Protection

0.11

Activations Density 0.054%