INDEX

Explanations

names preceded by bold markersSelf-correction: The prompt asks for something specific without mentioning tokens or patterns. The list shows `` in MAX_ACTIVATING_TOKENS aligning with names in TOP_ACTIVATING_TEXTS and TOP_POSITIVE_LOGITS. The TOKENS_AFTER_MAX_ACTIVATING_TOKEN are also names/initials. The explanation should capture the detected entity type. The `` token often acts as a marker for bolded words, and bolded words in this context are typically names. Therefore, the neuron seems to focus on names that are presented in a specific format (bolded).names in bold

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Barca

0.82

 Minaj

0.73

 দলটি

0.72

 जानवर

0.71

 FastAPI

0.70

az

0.69

公主

0.69

ާތ

0.69

**

0.68

(**

0.68

POSITIVE LOGITS

 Sean

1.10

 Steve

1.04

Steve

1.04

 Alan

1.02

 Mike

1.02

 Matthew

1.01

 Andrew

1.01

 Michael

0.98

 William

0.97

 Josh

0.97

Activations Density 0.076%