INDEX

Explanations

animal welfare and harm

np_acts-logits-general · gemini-2.5-flash-lite

content related to animal welfare, rights, and protection.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

The neuron activates on words referring to animals or animal‐related concepts (e.g. livestock, animals, animal welfare).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-1.52

-1.41

for

-1.41

en

-1.38

 like

-1.38

in

-1.36

-1.32

 will

-1.30

-1.26

man

-1.23

POSITIVE LOGITS

fokus

1.77

kalender

1.70

moderne

1.62

denk

1.54

denken

1.54

dekor

1.53

 minimale

1.53

gelang

1.52

 provinciale

1.49

klima

1.48

Activations Density 0.030%

animal welfare and harm

content related to animal welfare, rights, and protection.

The neuron activates on words referring to animals or animal‐related concepts (e.g. livestock, animals, animal welfare).

No Comments

No Known Activations

animal welfare and harm

content related to animal welfare, rights, and protection.

The neuron activates on words referring to animals or animal‐related concepts (e.g. livestock, animals, animal welfare).

No Comments

No Known Activations