INDEX

Explanations

the word "against" and phrases indicating direction and impact

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ValueStyle

-0.63

orteur

-0.63

 habet

-0.59

ambito

-0.57

 locomotion

-0.57

UTERS

-0.56

Distribuzione

-0.56

لينكات

-0.55

writeField

-0.55

 ladle

-0.55

POSITIVE LOGITS

 Against

0.69

 wall

0.66

+#+#

0.64

Against

0.62

 Wall

0.58

 AGAINST

0.58

 against

0.57

tingly

0.57

against

0.56

 WALL

0.56

Activations Density 0.019%