INDEX

Explanations

Scary

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ropoda

-0.07

تیجه

-0.07

 undermining

-0.06

 hüküm

-0.06

<|end_of_text|>

-0.06

(album

-0.06

mpp

-0.06

eworthy

-0.06

useppe

-0.06

 유지

-0.06

POSITIVE LOGITS

 frightening

0.11

 scare

0.10

 scared

0.09

 frightened

0.09

 scary

0.08

 scares

0.08

 scenarios

0.07

Sc

0.07

长

0.07

 remotely

0.07

Activations Density 0.007%

Scary

No Comments

No Known Activations

Scary

No Comments

No Known Activations