INDEX

Explanations

references to violence and harmful ideologies, particularly relating to genocide and oppression

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

less

-0.06

ãĥĬ

-0.06

-0.05

yth

-0.05

 sparing

-0.05

shop

-0.05

-less

-0.05

aint

-0.05

POSITIVE LOGITS

avou

0.09

tuk

0.09

apesh

0.08

hoot

0.08

ersive

0.08

eryl

0.08

Ð·Ð½Ð°ÑĩÐ°

0.08

-pills

0.07

à¸Ļà¸Ħ

0.07

bard

0.07

Activations Density 0.072%