INDEX

Explanations

occurrences of recognizable entities, including names and places

oai_token-act-pair · gpt-4o-mini Triggered by @bot

dates and years, particularly from the late 19th to early 21st centuries.

oai_token-act-pair · claude-3-5-sonnet-20240620 Triggered by @nrbln

years between 1900 and 2099.

oai_token-act-pair · gemini-1.5-pro Triggered by @nrbln

Dates and years are often marked as important, especially when referring to historical events or founding dates of organizations. The word "the" is frequently highlighted, particularly when preceding important nouns or concepts. Some examples also emphasize personal pronouns like "it" or "she" when referring to organizations or individuals.

eleuther_acts_top20 · claude-3-5-sonnet-20240620 Triggered by @nrbln

The marked words often include references to years, centuries, ordinal numbers, and unfilled placeholders within numerical or formatted contexts, highlighting missing or specific numeric or positional content frequently tied to historical, chronological, or formal identification.

eleuther_acts_top20 · gpt-4o Triggered by @nrbln

names, dates, or objects

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GEMMA-2-9B-IT @ 20-gemmascope-res-131k

Configuration

google/gemma-scope-9b-it-res/layer_20/width_131k/average_l0_81

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

131,072

Data Type

float32

Hook Name

blocks.20.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ActionCreators

-0.38

 psychi

-0.38

fizierung

-0.37

 utanför

-0.37

 neutre

-0.35

 fusil

-0.35

 Autorisations

-0.35

 Niedersachsen

-0.35

 minutes

-0.34

 charla

-0.34

POSITIVE LOGITS

uxxxx

0.68

ArrowToggle

0.61

 ModelExpression

0.60

EndContext

0.56

 للاسماء

0.52

LookAnd

0.51

tanleria

0.51

thâu

0.49

AndEndTag

0.48

hyrchwyd

0.47

Activations Density 0.039%

occurrences of recognizable entities, including names and places

dates and years, particularly from the late 19th to early 21st centuries.

years between 1900 and 2099.

The marked words often include references to years, centuries, ordinal numbers, and unfilled placeholders within numerical or formatted contexts, highlighting missing or specific numeric or positional content frequently tied to historical, chronological, or formal identification.

names, dates, or objects

No Comments

No Known Activations

occurrences of recognizable entities, including names and places

dates and years, particularly from the late 19th to early 21st centuries.

years between 1900 and 2099.

The marked words often include references to years, centuries, ordinal numbers, and unfilled placeholders within numerical or formatted contexts, highlighting missing or specific numeric or positional content frequently tied to historical, chronological, or formal identification.

names, dates, or objects

No Comments

No Known Activations