INDEX

Explanations

harmed networks

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 harm

-1.23

 injury

-0.82

nets

-0.81

 harming

-0.81

 Harm

-0.80

 harmed

-0.80

 Nets

-0.79

 nets

-0.76

 harms

-0.74

Nets

-0.69

POSITIVE LOGITS

XtraReports

0.81

PerformLayout

0.80

 समीक्षाएं

0.80

WriteBarrier

0.71

TagMode

0.69

postsleuth

0.65

UnsafeEnabled

0.63

sizeCache

0.63

makeText

0.62

aarrggbb

0.61

Activations Density 0.357%