© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

Home
Andy Arditi · Finding Misaligned Persona Features in Open-Weight Models
Llama3.1-8B-IT (Instruct)
Resid Post - 131k
7-RESID-POST-AA
71828

INDEX

Explanations

diagnosis

np_max-act · gemini-2.0-flash

New Auto-Interp

Top Features by Cosine Similarity

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ประว

-0.07

vature

-0.07

 surround

-0.06

лл

-0.06

 flows

-0.06

 Fetch

-0.06

LLP

-0.06

enario

-0.06

isc

-0.06

getClient

-0.06

POSITIVE LOGITS

 Icon

0.07

	assertThat

0.06

     ↵↵

0.06

_expand

0.06

 giản

0.06

_Bool

0.06

_exceptions

0.06

/io

0.06

(IM

0.06

zad

0.06

Activations Density 0.016%

No Known Activations