INDEX

Explanations

carcinogenic and mutagenic potential

np_acts-logits-general · gemini-2.5-flash-lite

words related to toxicity, specifically terms like "carcinogenic," "mutagenic," and "teratogenic" that describe harmful effects of substances.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

The neuron flags technical terms naming chemical hazards or toxic effects (e.g., carcinogenic, mutagenic, teratogenic, genotoxic).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

了许多

-1.37

co

-1.36

了不少

-1.35

 tinte

-1.33

 welchen

-1.32

us

-1.32

了很多

-1.31

菪

-1.30

king

-1.27

lin

-1.27

POSITIVE LOGITS

 ersten

1.37

you

1.34

 efekt

1.23

 reklama

1.22

 kérdés

1.22

 zeggen

1.21

Inggris

1.21

braz

1.20

mercado

1.15

emplares

1.15

Activations Density 0.098%

carcinogenic and mutagenic potential

words related to toxicity, specifically terms like "carcinogenic," "mutagenic," and "teratogenic" that describe harmful effects of substances.

The neuron flags technical terms naming chemical hazards or toxic effects (e.g., carcinogenic, mutagenic, teratogenic, genotoxic).

No Comments

No Known Activations

carcinogenic and mutagenic potential

words related to toxicity, specifically terms like "carcinogenic," "mutagenic," and "teratogenic" that describe harmful effects of substances.

The neuron flags technical terms naming chemical hazards or toxic effects (e.g., carcinogenic, mutagenic, teratogenic, genotoxic).

No Comments

No Known Activations