INDEX

Explanations

safe, secure

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

safe

-1.41

 safe

-1.38

 Safe

-1.07

Safe

-1.04

 safely

-0.94

 SAFE

-0.92

 sichere

-0.91

 safer

-0.88

 veilig

-0.86

 Safely

-0.86

POSITIVE LOGITS

apimachinery

0.54

yama

0.52

KURZBESCHREIBUNG

0.52

JsonHelper

0.50

tsin

0.50

 satellite

0.49

bought

0.48

السكان

0.48

lifted

0.47

bonucleic

0.47

Activations Density 0.057%