INDEX

Explanations

improvisation and disambiguation

The neuron consistently activates on words derived from the “improv-” root (e.g., improvisation, improvised, improv), i.e. it detects any form of “improv.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ſu

-2.58

 Their

-2.41

 hacen

-2.27

-2.25

犼

-2.25

 peculiar

-2.25

-2.17

 folgender

-2.08

 soooo

-2.06

 genellikle

-2.06

POSITIVE LOGITS

3.02

the

2.75

酲

2.58

Напомним

2.56

芣

2.55

Ранее

2.48

岁

2.30

۱۹

2.23

haviours

2.19

Activations Density 0.002%