INDEX

Explanations

instances of violence or aggression

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ayi

-0.07

iali

-0.06

wi

-0.06

erif

-0.06

Î»Î¹Î¬

-0.06

amac

-0.06

avig

-0.06

lip

-0.06

kker

-0.06

POSITIVE LOGITS

 toward

0.11

 towards

0.10

 onto

0.09

 naar

0.08

à¹ĥà¸ª

0.08

at

0.07

/Peak

0.07

 into

0.07

 Towards

0.07

at

0.07

Activations Density 0.027%