INDEX

Explanations

dirt, debris, harmful, virus

np_acts-logits-general · gemini-2.5-flash-lite

words related to contaminants or harmful substances that need to be filtered, removed, or protected against.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

The neuron activates on words naming unwanted particulates or contaminants (e.g. dirt, dust, debris, germs, bacteria, soot).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Alguns

-1.39

 видит

-1.38

☆、

-1.36

 Ultimately

-1.35

 Shortly

-1.34

（

-1.30

 departement

-1.30

 Knowing

-1.28

it

-1.27

Horário

-1.27

POSITIVE LOGITS

you

1.41

 musisz

1.41

 innym

1.38

 godziny

1.35

 pomaga

1.34

 your

1.34

 pozosta

1.34

 jazdy

1.34

 eerste

1.33

 ใน

1.33

Activations Density 0.075%

dirt, debris, harmful, virus

words related to contaminants or harmful substances that need to be filtered, removed, or protected against.

The neuron activates on words naming unwanted particulates or contaminants (e.g. dirt, dust, debris, germs, bacteria, soot).

No Comments

No Known Activations

dirt, debris, harmful, virus

words related to contaminants or harmful substances that need to be filtered, removed, or protected against.

The neuron activates on words naming unwanted particulates or contaminants (e.g. dirt, dust, debris, germs, bacteria, soot).

No Comments

No Known Activations