INDEX

Explanations

actions associated with physical violence and aggression

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

çīĩ

-0.07

atics

-0.07

act

-0.07

habi

-0.07

pez

-0.07

.CR

-0.07

erc

-0.07

iÅ¡tÄĽ

-0.06

imers

-0.06

utt

-0.06

POSITIVE LOGITS

 unspecified

0.07

ulary

0.06

sob

0.06

 unnamed

0.06

sap

0.06

0.05

ushi

0.05

 physical

0.05

etag

0.05

leÅŁ

0.05

Activations Density 0.012%