INDEX

Explanations

Blog/forum posts

np_max-act · gemini-2.0-flash

instances where the speaker expresses personal low mood, depression, stress, or asks for emotional help or support.

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

하면

-0.06

 BSON

-0.06

课

-0.06

 photographed

-0.06

�

-0.06

姉

-0.06

ский

-0.06

้ำ

-0.06

يب

-0.06

ارب

-0.06

POSITIVE LOGITS

pou

0.07

_ib

0.07

bew

0.07

vol

0.07

ワイト

0.07

 σχ

0.07

(theta

0.06

 negligent

0.06

 indiv

0.06

 wishing

0.06

Activations Density 0.024%

Blog/forum posts

instances where the speaker expresses personal low mood, depression, stress, or asks for emotional help or support.

No Comments

No Known Activations

Blog/forum posts

instances where the speaker expresses personal low mood, depression, stress, or asks for emotional help or support.

No Comments

No Known Activations