INDEX

Explanations

new and improved

This neuron detects promotional or comparative adjectives and adverbs that highlight improvements, enhancements, or increases (e.g., “more,” “improved,” “bigger,” “safer,” “new”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

if

-1.64

At

-1.54

at

-1.51

However

-1.43

During

-1.40

or

-1.38

 suelen

-1.34

 predomin

-1.34

 Regardless

-1.32

 these

-1.31

POSITIVE LOGITS



1.48

 gange

1.38

🪛

1.34

 новые

1.33

ఽ

1.32

 JUNE

1.29

萏

1.29

逦

1.28

 interak

1.26

鵙

1.26

Activations Density 0.087%