INDEX

Explanations

racial bias

np_max-act · gemini-2.0-flash

This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.

oai_token-act-pair · gpt-5-mini Triggered by @tatsatx

Explanation could not be parsed.

eleuther_acts_top20 · gpt-5-nano Triggered by @tatsatx

Explanation could not be parsed.

eleuther_acts_top20 · gpt-5-mini Triggered by @tatsatx

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 leaf

-0.08

 losse

-0.08

 folos

-0.08

 postfix

-0.07

roy

-0.07

 borrowed

-0.07

 topo

-0.07

 אית

-0.07

 overw

-0.07

ויה

-0.07

POSITIVE LOGITS

 racial

0.20

racial

0.17

黑人

0.16

 racism

0.16

 ethnic

0.16

 ethnicity

0.15

 racist

0.15

 minorities

0.14

 multicultural

0.14

 LGBTQ

0.14

Activations Density 0.291%

racial bias

This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.

Explanation could not be parsed.

Explanation could not be parsed.

No Comments

No Known Activations

racial bias

This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.

Explanation could not be parsed.

Explanation could not be parsed.

No Comments

No Known Activations