INDEX

Explanations

was

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Ultimate

-0.25

å°ıäºİ

-0.25

 Liberation

-0.25

ÑģÑĤÐ°Ð²Ð»ÐµÐ½

-0.25

ä¸ºç©º

-0.25

(Parcel

-0.24

éªŀ

-0.24

ÙĬØ±Ø§

-0.24

\Active

-0.24

 drain

-0.24

POSITIVE LOGITS

ä¸¤ä¸ª

0.35

 towards

0.33

 close

0.30

in

0.30

two

0.29

ousse

0.29

 specifically

0.28

 toward

0.27

opus

0.27

 handic

0.26

Activations Density 0.109%