INDEX

Explanations

instructions and procedures

np_max-act · gemini-2.0-flash

sentences or phrases that give step-by-step instructions or procedural guidance, especially for emergency/safety responses.

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.tex

-0.08

som

-0.07

 tối

-0.07

 nichž

-0.07

КИ

-0.07

 багатьох

-0.07

uğ

-0.07

НО

-0.07

.cuda

-0.07

帕

-0.07

POSITIVE LOGITS

обыти

0.06

 urges

0.06

(enc

0.05

 Marseille

0.05

 misled

0.05

 bookstore

0.05

 Andr

0.05

 retailer

0.05

yalty

0.05

 Vintage

0.05

Activations Density 0.068%

instructions and procedures

sentences or phrases that give step-by-step instructions or procedural guidance, especially for emergency/safety responses.

No Comments

No Known Activations

instructions and procedures

sentences or phrases that give step-by-step instructions or procedural guidance, especially for emergency/safety responses.

No Comments

No Known Activations