INDEX

Explanations

reducing negative health outcomes

np_acts-logits-general · gemini-2.5-flash-lite

phrases related to reducing health problems, particularly discussing how substances lower cholesterol, stress, or disease risk.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

The neuron activates on verbs and keywords that describe reducing, neutralizing, inhibiting, or otherwise lowering harmful substances or effects (e.g. “neutralize,” “reduce,” “lower,” “inhibit”).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_34/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

加減

-0.81

GUID

-0.80

 AdapterView

-0.79

 uninterrupted

-0.78

zyst

-0.75

uman

-0.75

HTH

-0.75

 stalked

-0.75

cenie

-0.75

 herida

-0.75

POSITIVE LOGITS

 oxidative

1.36

 inflammation

1.30

 przechowy

1.05

 certain

1.02

 several

1.00

 both

0.97

Signs

0.97

 many

0.97

許多

0.95

 både

0.94

Activations Density 0.044%

reducing negative health outcomes

phrases related to reducing health problems, particularly discussing how substances lower cholesterol, stress, or disease risk.

The neuron activates on verbs and keywords that describe reducing, neutralizing, inhibiting, or otherwise lowering harmful substances or effects (e.g. “neutralize,” “reduce,” “lower,” “inhibit”).

No Comments

No Known Activations

reducing negative health outcomes

phrases related to reducing health problems, particularly discussing how substances lower cholesterol, stress, or disease risk.

The neuron activates on verbs and keywords that describe reducing, neutralizing, inhibiting, or otherwise lowering harmful substances or effects (e.g. “neutralize,” “reduce,” “lower,” “inhibit”).

No Comments

No Known Activations