INDEX

Explanations

The user wants a concise explanation (3-20 words) of a neuron's behavior, based on patterns found in four lists.Let's analyze the lists:1. MAX_ACTIVATING_TOKENS: `1, 1, 2, 1, ik, 2, 2, dist, 1, the` * Contains numbers (1, 2) and common words ('the'). Also 'ik' and 'dist'.2. TOKENS_AFTER_MAX_ACTIVATING_TOKEN: `9, -, game, -, af, . , 4, orts, 0, preferred` Words like 'game' and 'preferred' appear.3. TOP_POSITIVE_LOGITS: `ne, which, ki, id, that, in, which, in, from, ma` * Common words and potential word beginnings/endings.4. TOP_ACTIVATING_TEXTS: * Mix of code snippets, technical explanations, and conversational text. * One text mentions "CS1955", "CS1-DQ", "CS1-I2C", "CS2 game process", "CS2. Rank S", "CS240 server", "CS109", "CS50’s".cs1, cs2, cs109, cs240

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

地方

0.60

 nostre

0.57

お得

0.56

朋友

0.54

я

0.53

不

0.53

 nossos

0.53

 vijf

0.52

 échange

0.52

వ

0.52

POSITIVE LOGITS

ne

0.80

 which

0.77

ki

0.76

id

0.74

 that

0.72

in

0.71

which

0.71

in

0.71

 from

0.71

ma

0.71

Activations Density 0.000%