INDEX

Explanations

references to individuals or events related to a person named Hendricks

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

Juliushanhanhan/llama-3-8b-it-res/blocks.25.hook_resid_post

Features

65,536

Data Type

float32

Hook Name

blocks.25.hook_resid_post

Hook Layer

Architecture

gated

Context Size

1,024

Dataset

Juliushanhanhan/openwebtext-1b-llama3-tokenized-cxt-1024

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

boa

-0.17

lant

-0.15

taire

-0.15

illed

-0.15

adas

-0.15

odore

-0.14

eless

-0.14

ching

-0.14

neau

-0.14

uros

-0.14

POSITIVE LOGITS

rix

0.28

erson

0.27

rick

0.26

rik

0.24

ricks

0.22

ry

0.20

reich

0.20

ri

0.18

rich

0.18

rych

0.17

Activations Density 0.005%

references to individuals or events related to a person named Hendricks

No Comments

No Known Activations

references to individuals or events related to a person named Hendricks

No Comments

No Known Activations