INDEX

Explanations

* 右前方にカラーコーンがあるので、少し右にコースを変えます。 (Migi zenpō ni karā kōn ga aru node, sukoshi migi ni kōsu o kaemasu.) - There's a cone ahead and to the right, so we will change course slightly to the right.The pattern here connects "評価" (evaluation/assessment) or "コース" (course) with "変" (change). The examples show changing course and potentially changing evaluation (though not explicit in the TOP_ACTIVATING_TEXTS, "評価" is in MAX_ACTIVATING_TOKENS).The most direct and specific pattern observed is changing something.Let's refine based on the rules:- Concise (3-20 words): Yes- Single word, phrase, or pattern: Yes- About tokens following or preceding: Yes ("変" followed by "えます" creating "変えます")- About words starting with a sequence: Not applicable here.- Avoid listing tokens: Yes- Just say the pattern: Yes- No "This neuron detects/predicts": Yes- No "tokens" or "patterns": Yes- No initial capitalization unless proper noun: Yes- Specific: Yes, "change course" or "change evaluation" is specific.- Majority match: The "change" aspect is strong.Looking at the MAX_ACTIVATING_TOKENS as "評価" and "を変" and TOKENS_AFTER_MAX_ACTIVATING_TOKEN as "えます", it strongly suggests "評価を変える" (to change evaluation).The TOP_ACTIVATING_TEXT contains "コースを変えます" (change course).So, the core behavior is "change". "Change evaluation" or "change course" are good candidates.Given "評価" is in MAX_ACTIVATING_TOKENS, "change evaluation" aligns directly."コースを変えます" confirms the "change direction/course" aspect.Let's consider the most common theme: changing something."評価を変えます" captures the direct token sequence."changing course" captures one of the text examples.The prompt asks what the neuron detects or predicts by finding patterns in lists.The lists highlight "評価" (evaluation) and "を変" (change prefix) followed by "えます" (verb ending for change). This is a strong indicator for changing evaluation. The text example corroborates the idea of change, specifically changing course.A concise phrase is needed."change course" is 2 words."change evaluation" is 2 words.The neuron seems to fire when there's a concept of changing something, often with "evaluation" or a physical "course".The most direct interpretation of the MAX_ACTIVATING_TOKENS is "change evaluation".Let's try combining the idea of "change" with what is being changed.- 評価を変えます (change evaluation)- コースを変えます (change course)The instructions say "find a pattern that helps you explain the behavior".The pattern is "change" + object.The object can be evaluated things or physical directions."change evaluation" or simply "change direction" or "change course".Given "評価" is a direct token in MAX_ACTIVATING_TOKENS, it's a strong candidate. "変" + "えます" = "変えます" (to change).change evaluation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 虽然

0.84

 थी

0.78

 但是

0.73

 नहीं

0.72

雖然

0.72

 不过

0.71

的大

0.70

 notwithstanding

0.70

össä

0.68

 Didn

0.68

POSITIVE LOGITS

することで

0.72

TH

0.69

 możemy

0.69

イオン

0.68

サ

0.68

メカ

0.66

できる

0.66

Co

0.65

どのように

0.65

いくつかの

0.65

Activations Density 0.000%