INDEX

Explanations

risk quantification

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

iture

-0.25

 Depos

-0.24

alu

-0.24

æ·±åħ¥

-0.23

arty

-0.23

çĤĢ

-0.23

.depth

-0.23

 warmly

-0.23

_switch

-0.22

æ·±åº¦

-0.22

POSITIVE LOGITS

æ¦Ĥçİĩ

0.67

 probability

0.66

 probabilities

0.63

çļĦæ¦Ĥçİĩ

0.59

 statistically

0.55

Probability

0.54

åĩłçİĩ

0.54

probability

0.54

 Probability

0.50

 proportion

0.47

Activations Density 0.166%