INDEX

Explanations

instructions and warnings

The neuron fires on advisory or directive language—phrases urging or instructing people to do (or not do) something.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ဋ

-1.23

/}.

-1.12

vieve

-1.07

 prêtres

-1.05

for

-1.02

 ibland

-1.02

ཙ

-1.02

 aunque

-1.01

}],

-1.00

дні

-1.00

POSITIVE LOGITS

ﻻ

1.38

 bemer

1.34

⢕

1.32

 verhind

1.31

Familien

1.28

anstal

1.27

Glück

1.25

 déposer

1.23

他们

1.21

 Wednesday

1.21

Activations Density 0.015%