INDEX

Explanations

instances of individuals taking action to assist others in distress

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

jom

-0.07

Å¼e

-0.07

anki

-0.07

egt

-0.07

IALIZED

-0.07

aldi

-0.07

ä¼ı

-0.07

ottes

-0.07

usercontent

-0.07

.hwp

-0.07

POSITIVE LOGITS

 intervention

0.11

 rescue

0.10

 helping

0.10

 intervene

0.10

 resc

0.10

 help

0.10

 interventions

0.09

 interven

0.09

 intervened

0.09

 Intervention

0.08

Activations Density 0.081%