INDEX

Explanations

roman numerals and punctuation

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

מוז

-0.07

(',')[

-0.07

קטע

-0.07

!(:

-0.07

ros

-0.07

蒙古

-0.07

fé

-0.06

公共资源

-0.06

舉辦

-0.06

有效性

-0.06

POSITIVE LOGITS

десят

0.08

 dynam

0.07

덟

0.07

sy

0.07

"><!--

0.07

nem

0.07

毁

0.07

 associ

0.06

ươi

0.06

洗涤

0.06

Activations Density 0.023%

roman numerals and punctuation

No Comments

No Known Activations

roman numerals and punctuation

No Comments

No Known Activations