INDEX

Explanations

terms related to violence and violent actions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

bsolute

-0.07

esel

-0.07

senal

-0.07

rupted

-0.07

ony

-0.07

.scalablytyped

-0.07

elian

-0.07

idar

-0.07

_EMIT

-0.06

anian

-0.06

POSITIVE LOGITS

ness

0.08

rome

0.08

rone

0.08

 nature

0.07

ĺ

0.06

 Nature

0.06

 pend

0.06

pend

0.06

LOAT

0.06

Activations Density 0.009%