INDEX

Explanations

references to railings and safety features in architectural contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 straw

-0.08

çª

-0.07

ccione

-0.07

ãģĻãģĻ

-0.07

 NDEBUG

-0.07

 BoxFit

-0.07

 Insecta

-0.07

vÃ¤

-0.07

 Diagnostic

-0.06

ÑģÑĭÐ»

-0.06

POSITIVE LOGITS

 Safety

0.07

 safety

0.07

afety

0.07

Safety

0.07

-height

0.07

 stair

0.07

 Cable

0.06

inst

0.06

 pedestrian

0.06

hog

0.06

Activations Density 0.001%