INDEX

Explanations

protection and shielding

protection and its contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

;%(

0.42

プロセス

0.41

 impresionante

0.41

ंड

0.39

ceğini

0.39

ヂストン

0.39

 impressive

0.39

얹

0.39

清新

0.39

 introducir

0.39

POSITIVE LOGITS

 shield

0.89

 afforded

0.79

 shields

0.78

 against

0.77

 shielding

0.74

 Shield

0.69

伞

0.68

Against

0.68

against

0.67

shield

0.66

Activations Density 0.033%