Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Toronto

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.error

-0.07

(value

-0.07

 unlaw

-0.07

 Politics

-0.07

ffic

-0.07

lpVtbl

-0.07

 vomiting

-0.06

 flew

-0.06

 Addiction

-0.06

㎄

-0.06

POSITIVE LOGITS

^\

0.07

stay

0.07

needle

0.07

爐

0.07

])),

0.07

----</

0.07

 Resorts

0.07

 cheered

0.07

 ngại

0.07

sortBy

0.06

Activations Density 0.093%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact