INDEX

Explanations

the

This neuron primarily detects the word “the,” especially when it appears in code comments or documentation.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Sabes

-1.55

TÉ

-1.53

躄

-1.52

könig

-1.52

Estás

-1.51

 constat

-1.48

Nép

-1.48

Zubereitung

-1.47

𝐖

-1.47

vereignty

-1.43

POSITIVE LOGITS

 from

1.91

At

1.53

 provides

1.52

one

1.46

1.43

 with

1.42

as

1.41

 Even

1.36

 information

1.34

is

1.33

Activations Density 0.001%