INDEX

Explanations

negative or cautionary phrases and expressions related to safety or directives

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

Juliushanhanhan/llama-3-8b-it-res/blocks.25.hook_resid_post

Features

65,536

Data Type

float32

Hook Name

blocks.25.hook_resid_post

Hook Layer

Architecture

gated

Context Size

1,024

Dataset

Juliushanhanhan/openwebtext-1b-llama3-tokenized-cxt-1024

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ennon

-0.14

Ã¡cil

-0.14

slideUp

-0.14

 downfall

-0.14

ADIUS

-0.14

ÏĮÏĤ

-0.14

isis

-0.13

otime

-0.13

adr

-0.13

cntl

-0.13

POSITIVE LOGITS

 touch

0.36

 Touch

0.32

touch

0.30

Touch

0.29

 TOUCH

0.28

 touched

0.28

 touches

0.27

-touch

0.27

 touching

0.27

_touch

0.22

Activations Density 0.043%

negative or cautionary phrases and expressions related to safety or directives

No Comments

No Known Activations