INDEX

Explanations

Interpretation of Neuron BehaviorObservation:* `MAX_ACTIVATING_TOKENS`: rich, perfectly, Bright, simple, warm, simple, wide, intelligent, painted, wide* `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`: voice, sculpt, crimson, songs, energy, fellow, assortment, song, faces, exThe `MAX_ACTIVATING_TOKENS` often describe a quality or characteristic, and the `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` often represent something that possesses that quality or is modified by it.* `rich voice`* `perfectly sculpt` (or sculpted)* `Bright crimson`* `simple songs`* `warm energy`* `simple fellow`* `wide assortment`* `intelligent songwriting` (from `TOP_ACTIVATING_TEXTS` context, even though `song` is here)* `painted faces`* `wide expanse` (from `TOP_ACTIVATING_TEXTS` context, `wide` -> `expanse`)The pattern is consistently an adjective followed by a noun it modifies. The adjective signifies a descriptive quality, and the noun is the object or concept being described.Proposed Explanation:adjective followed by noun

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

asca

-0.06

å®¶å®¶

-0.06

èĬ³åįİ

-0.06

ÙĨÙĬØ¹

-0.06

eneral

-0.06

Tail

-0.06

alanya

-0.06

prototype

-0.06

TAG

-0.06

ç¿¼ç¿¼

-0.06

POSITIVE LOGITS

—at

0.05

å¯¹äººçļĦ

0.05

ajan

0.05

éĢĤæĹ¶

0.05

oder

0.05

 Locker

0.05

rg

0.05

 locker

0.05

è¾º

0.04

hum

0.04

Activations Density 0.006%