INDEX

Explanations

division and polarization

np_acts-logits-general · gemini-2.5-flash-lite

content related to political division, polarization, and social tensions.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

This neuron fires strongly on words and phrases related to division or polarization (e.g. divide, divided, polarization).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_34/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 révé

-0.95

ḿ

-0.93

%%%%%%%%%

-0.93

Compress

-0.85

 partenaire

-0.82

 préoccup

-0.82

 municípios

-0.82

 kooper

-0.82

 lumineux

-0.81

şık

-0.79

POSITIVE LOGITS

 divisive

2.17

 divide

1.99

 polar

1.98

 division

1.88

 polarization

1.75

 divides

1.73

 dividing

1.71

 instig

1.70

 inflammatory

1.67

 divisions

1.66

Activations Density 0.036%

division and polarization

content related to political division, polarization, and social tensions.

This neuron fires strongly on words and phrases related to division or polarization (e.g. divide, divided, polarization).

No Comments

No Known Activations

division and polarization

content related to political division, polarization, and social tensions.

This neuron fires strongly on words and phrases related to division or polarization (e.g. divide, divided, polarization).

No Comments

No Known Activations