INDEX

Explanations

conjunctions and transitional phrases that signify contrast or condition

oai_token-act-pair · gpt-4o-mini Triggered by @bot

Tokens after conjunctions

np_acts-logits-general · gemini-2.0-flash

various conjunctions and adverbs

np_acts-logits-general · gemini-2.5-flash-lite

The marked tokens appear to be fragments of words that have been split across delimiters, often appearing within proper nouns, technical terms, or compound words in diverse academic and technical texts. The patterns suggest these are either OCR/text encoding artifacts, reference citations within brackets, or deliberate word segmentation where parts of a single word are delimited separately from their surrounding context.

eleuther_acts_top20 · claude-4-5-haiku Triggered by @jacktpayne51

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GEMMA-2-2B @ 20-gemmascope-res-16k

Configuration

google/gemma-scope-2b-pt-res/layer_20/width_16k/average_l0_71

Prompts (Dashboard)

36,864 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.20.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

fant

-0.65

freiheit

-0.57

dib

-0.55

fa

-0.52

 enda

-0.51

 Vogt

-0.50

 Griffin

-0.49

Nuorodos

-0.49

Šaltiniai

-0.49

geräte

-0.49

POSITIVE LOGITS

AsUp

1.13

 تضيفلها

0.77

MeasureSpec

0.75

 auroit

0.71

ſelf

0.71

makeConstraints

0.71

SuppressMessage

0.71

 useStyles

0.68

HasAnnotation

0.67

 makeStyles

0.67

Activations Density 0.542%

conjunctions and transitional phrases that signify contrast or condition

Tokens after conjunctions

various conjunctions and adverbs

No Comments

No Known Activations

conjunctions and transitional phrases that signify contrast or condition

Tokens after conjunctions

various conjunctions and adverbs

No Comments

No Known Activations