INDEX

Explanations

references to punching or physical aggression

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.07

ieg

-0.07

'gc

-0.07

 Holmes

-0.07

ipc

-0.07

ForMember

-0.07

oplay

-0.06

ÑĢÐ°Ð²Ð°

-0.06

igate

-0.06

/ay

-0.06

POSITIVE LOGITS

 punching

0.08

 punch

0.08

ardon

0.08

 punched

0.08

 holes

0.07

elog

0.07

 fist

0.07

 punches

0.07

Ð»Ð°Ð½

0.07

Activations Density 0.006%