INDEX

Explanations

references to white supremacist ideologies and groups

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

Juliushanhanhan/llama-3-8b-it-res/blocks.25.hook_resid_post

Features

65,536

Data Type

float32

Hook Name

blocks.25.hook_resid_post

Hook Layer

Architecture

gated

Context Size

1,024

Dataset

Juliushanhanhan/openwebtext-1b-llama3-tokenized-cxt-1024

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ronic

-0.17

utr

-0.17

rug

-0.15

nette

-0.14

iro

-0.14

opp

-0.14

argest

-0.14

opping

-0.14

inue

-0.14

inde

-0.14

POSITIVE LOGITS

 groups

0.28

 Groups

0.23

-groups

0.20

groups

0.19

Groups

0.19

(groups

0.18

_groups

0.18

 organizations

0.17

 group

0.16

 Odin

0.16

Activations Density 0.038%

references to white supremacist ideologies and groups

No Comments

No Known Activations