INDEX

Explanations

Toxicity/medical advice

np_max-act · gemini-2.0-flash

Finds instructions or urgent medical-action recommendations (advice to seek immediate care or specific steps to take) in poisoning or toxic-exposure contexts.

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 analyzer

-0.07

 والد

-0.07

 burden

-0.07

Orden

-0.07

utc

-0.07

ІІ

-0.07

 manten

-0.06

кра

-0.06

_fire

-0.06

rica

-0.06

POSITIVE LOGITS

,'']]],↵

0.06

οι

0.06

STRUCTION

0.06

,new

0.06

 gypsum

0.06

 vielleicht

0.06

UGHT

0.06

�

0.06

.lucene

0.06

 Sold

0.06

Activations Density 0.028%

Toxicity/medical advice

Finds instructions or urgent medical-action recommendations (advice to seek immediate care or specific steps to take) in poisoning or toxic-exposure contexts.

No Comments

No Known Activations

Toxicity/medical advice

Finds instructions or urgent medical-action recommendations (advice to seek immediate care or specific steps to take) in poisoning or toxic-exposure contexts.

No Comments

No Known Activations