INDEX

Explanations

describing a reaction or sequence

np_acts-logits-general · gemini-2.5-flash-lite

technical and domain-specific terminology that establishes the central topic or subject area of academic and professional text.

oai_token-act-pair · claude-4-5-haiku Triggered by @jamesnaruto04

The marked tokens across these examples indicate instances where special markers are applied to noun phrases, proper nouns, clausal elements, and multi-word sequences that appear to be highlighted for emphasis or structural importance within generated text. The pattern shows markers applied to subjects of sentences, key descriptive phrases, organizational names, locations, and technical terms that carry semantic weight in their respective contexts.

eleuther_acts_top20 · claude-4-5-haiku Triggered by @jamesnaruto04

analytical or formal discourse

np_max-act · claude-4-5-haiku Triggered by @jamesnaruto04

Tokens related to describing abstract concepts, qualities, characteristics, or systemic features in formal or academic writing, particularly when discussing complex topics, problems, approaches, impacts, or analytical frameworks.

eleuther_acts_top20 · claude-4-5-sonnet Triggered by @jamesnaruto04

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_31_width_16k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 nagyon

1.05

 veldig

1.05

がたくさん

0.98

 खूप

0.97

 trochę

0.92

 خیلی

0.84

很多

0.83

 people

0.82

みんな

0.81

いろいろ

0.81

POSITIVE LOGITS

 столь

0.82

 посредством

0.76

 নিঃসন্দেহে

0.75

 aisément

0.75

 पश्चात

0.75

 ব্যতীত

0.71

 কিংবা

0.71

 lediglich

0.70

 অতঃপর

0.69

 প্রসঙ্গে

0.68

Activations Density 0.624%

describing a reaction or sequence

technical and domain-specific terminology that establishes the central topic or subject area of academic and professional text.

analytical or formal discourse

Tokens related to describing abstract concepts, qualities, characteristics, or systemic features in formal or academic writing, particularly when discussing complex topics, problems, approaches, impacts, or analytical frameworks.

No Comments

No Known Activations