INDEX

Explanations

terms associated with fear, avoidance, or negative reactions towards various subjects

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Laur

-0.06

eron

-0.06

Å

-0.06

fter

-0.06

erspective

-0.06

mun

-0.06

war

-0.05

 Ø¬ÙĦ

-0.05

 Digit

-0.05

enso

-0.05

POSITIVE LOGITS

ustral

0.07

 Acres

0.07

Merit

0.07

anza

0.07

aleigh

0.06

ombat

0.06

_Invoke

0.06

ictim

0.06

unker

0.06

icom

0.06

Activations Density 0.003%