INDEX

Explanations

mentions and discussions of gender, specifically references to "men" and "women."

oai_token-act-pair · gpt-4.1-2025-04-14 Triggered by @bcywinski

references to women and girls

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

google/gemma-scope-9b-pt-res/layer_23/width_131k/average_l0_101

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

131,072

Data Type

float32

Hook Name

blocks.23.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ubscribe

-0.41

↴

-0.39

 klara

-0.38

Check

-0.38

oros

-0.37

Crash

-0.36

Pug

-0.36

Qualified

-0.35

tunnel

-0.35

Newspaper

-0.35

POSITIVE LOGITS

 women

0.82

 female

0.74

PreferredItem

0.73

 woman

0.72

 females

0.68

 férfi

0.64

 feminine

0.64

ValueStyle

0.63

 ladies

0.63

 male

0.63

Activations Density 0.068%

mentions and discussions of gender, specifically references to "men" and "women."

references to women and girls

No Comments

No Known Activations