INDEX

Explanations

expressions of moral outrage and feelings of shame or disgust

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

impl

-0.06

ge

-0.06

 Lamp

-0.06

ulator

-0.05

BTS

-0.05

 challenge

-0.05

 stresses

-0.05

 Paulo

-0.05

_TLS

-0.05

 reconstructed

-0.05

POSITIVE LOGITS

izon

0.07

ahren

0.07

RATION

0.07

ackers

0.07

ÅĻet

0.07

vard

0.07

 trav

0.07

orable

0.07

èįĴ

0.07

iaux

0.07

Activations Density 0.037%