INDEX

Explanations

theorem

The neuron specializes in spotting occurrences of the word “theorem.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

\&

-2.94

was

-2.92

?"

-2.78

柒

-2.70

潆

-2.53

arschu

-2.41

。

-2.41

bzw

-2.41

 или

-2.39

는

-2.39

POSITIVE LOGITS

4.63

↵↵

3.41

The

3.02

2.97

<i>

2.53

</i>

2.41

箆

2.33

鹬

2.33

ly

2.31

Activations Density 0.010%