INDEX

Explanations

The substring "IND" appearing within capitalized words, often names or abbreviations. There is also some evidence of a pattern involving the substring "ind" within words, sometimes in capitalized words and at other times not.

eleuther_acts_top20 · gemini-1.5-flash Triggered by @johnny

references to India or Indian identity

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_slefr-ajt/2-res_slefr-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.2.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

rum

-0.53

zek

-0.48

ris

-0.47

bra

-0.46

neys

-0.46

 barley

-0.44

AUD

-0.44

Bob

-0.44

eff

-0.44

occ

-0.43

POSITIVE LOGITS

hower

0.61

 Genocide

0.56

 Participant

0.53

ergic

0.52

anguage

0.51

afety

0.51

 Solitaire

0.49

 Turing

0.49

utical

0.48

å£«

0.48

Activations Density 0.007%

The substring "IND" appearing within capitalized words, often names or abbreviations. There is also some evidence of a pattern involving the substring "ind" within words, sometimes in capitalized words and at other times not.

references to India or Indian identity

No Comments

No Known Activations