INDEX

Explanations

user queries that directly address the assistant with second-person phrasing (especially “you”), often in “Do/Can you …?” requests.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

俳

-0.08

 Polygon

-0.07

 loại

-0.07

葎

-0.07

 additional

-0.07

 וכמובן

-0.07

.itemId

-0.07

 Scandin

-0.07

硪

-0.07

 словам

-0.07

POSITIVE LOGITS

ุ

0.07

حة

0.07

 créé

0.07

時点

0.07

口号

0.07

诊治

0.07

by

0.07

birth

0.06

 bất

0.06

 Reads

0.06

Activations Density 0.041%

user queries that directly address the assistant with second-person phrasing (especially “you”), often in “Do/Can you …?” requests.

No Comments

No Known Activations