INDEX

Explanations

common sentence completions

np_acts-logits-general · gemini-2.5-flash-lite

formatting and structural cues in prompts and dialogues, such as section labels, list items, numbering, and emphasized elements

oai_token-act-pair · gpt-5 Triggered by @vetterc0

The marked tokens appear at the beginning of the model's turn in conversational exchanges, often capturing the first substantive word(s) or phrase of the AI assistant's response. The pattern primarily highlights response initiators, key content words in answers (especially nouns, verbs, and important phrases), transitions between ideas, and occasionally formatting elements or punctuation that structure the response. The markers seem to identify semantically significant tokens that establish the direction or core meaning of the assistant's reply.

eleuther_acts_top20 · claude-4-5-sonnet Triggered by @jamesnaruto04

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_31_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 основные

0.55

 отдельные

0.54

長期

0.52

൭

0.49

 संस्थ

0.49

 repert

0.49

 વિભાગ

0.48

 மாவட்ட

0.47

 collectivités

0.47

กลุ่ม

0.46

POSITIVE LOGITS

 chicken

0.71

 chocolate

0.71

 cheese

0.69

 Cheese

0.68

 cows

0.67

 pizza

0.65

 Pokemon

0.65

 beer

0.64

 joke

0.63

 bacon

0.63

Activations Density 0.101%

common sentence completions

formatting and structural cues in prompts and dialogues, such as section labels, list items, numbering, and emphasized elements

No Comments

No Known Activations

common sentence completions

formatting and structural cues in prompts and dialogues, such as section labels, list items, numbering, and emphasized elements

No Comments

No Known Activations