INDEX

Explanations

damage or disease

np_acts-logits-general · gemini-2.5-flash-lite

health-related content discussing medical conditions, treatments, and preventative measures.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

The neuron fires on health‐related benefit terms—words describing protective or therapeutic effects (e.g. cardiovascular health, neutralize free radicals, lower cholesterol, protect teeth).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

to

-1.37

 Pemain

-1.24

Pranala

-1.23

 hébergement

-1.21

וּ

-1.16

return

-1.14

 exasper

-1.13

unno

-1.13

 каждому

-1.11

 confortable

-1.10

POSITIVE LOGITS

 particularly

1.40

 especially

1.38

 very

1.32

 poliester

1.30

1.25

 increased

1.21

ございません

1.21

 decreases

1.20

 prevents

1.18

you

1.16

Activations Density 0.022%

damage or disease

health-related content discussing medical conditions, treatments, and preventative measures.

The neuron fires on health‐related benefit terms—words describing protective or therapeutic effects (e.g. cardiovascular health, neutralize free radicals, lower cholesterol, protect teeth).

No Comments

No Known Activations

damage or disease

health-related content discussing medical conditions, treatments, and preventative measures.

The neuron fires on health‐related benefit terms—words describing protective or therapeutic effects (e.g. cardiovascular health, neutralize free radicals, lower cholesterol, protect teeth).

No Comments

No Known Activations