INDEX

Explanations

themes of hatred and animosity towards individuals or groups

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

bourg

-0.07

ocab

-0.07

idunt

-0.07

Î´Î¬

-0.07

Äįet

-0.07

.sg

-0.07

 omas

-0.07

eroon

-0.07

assin

-0.07

.ws

-0.07

POSITIVE LOGITS

 anim

0.07

æģ¨

0.07

 dating

0.07

 rival

0.06

 hatred

0.06

 Anim

0.06

 hostile

0.06

 distrust

0.06

 venom

0.06

 unle

0.06

Activations Density 0.061%