INDEX

Explanations

Conversational, subjective language

np_max-act · gemini-2.0-flash

words that express negation or comparison/contrast (e.g., "n't", "than", comparative markers).

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 sailor

-0.07

 пром

-0.06

ویزی

-0.06

Hit

-0.06

avana

-0.06

tips

-0.06

Collapse

-0.06

Perl

-0.06

альному

-0.05

ROWN

-0.05

POSITIVE LOGITS

_notifier

0.07

 enterprise

0.07

 الوقت

0.07

(cp

0.06

^=

0.06

(IC

0.06

_attention

0.06

 apis

0.06

,R

0.06

neu

0.06

Activations Density 0.216%

Conversational, subjective language

words that express negation or comparison/contrast (e.g., "n't", "than", comparative markers).

No Comments

No Known Activations

Conversational, subjective language

words that express negation or comparison/contrast (e.g., "n't", "than", comparative markers).

No Comments

No Known Activations