INDEX

Explanations

hidden items

The neuron activates strongly on occurrences of the word “hidden.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 また

-2.70

-2.39

鲥

-2.31

絰

-2.19

呖

-2.09

-2.08

 crede

-2.06

儅

-2.00

-1.99

—

-1.98

POSITIVE LOGITS

ୌ

2.59

नलोड

2.45

Reparto

2.30

lossians

2.23

 diciendo

2.23

葎

2.20



2.20

pecabe

2.20

不愿意

2.16

писок

2.14

Activations Density 0.006%