INDEX

Explanations

Japanese honorifics

np_max-act · gemini-2.0-flash

The neuron activates on Japanese polite honorific prefixes (the “お” and “ご” often used to show respect).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Kenn

-0.08

Better

-0.07

売

-0.07

 murderous

-0.07

survey

-0.07

_Show

-0.06

_upgrade

-0.06

Nh

-0.06

 dung

-0.06

marshall

-0.06

POSITIVE LOGITS

_stock

0.07

 otom

0.07

하신

0.06

.custom

0.06

하시

0.06

�

0.06

(![

0.06

 výši

0.06

zip

0.06

::::::::::::::::::::::::::::::::

0.06

Activations Density 0.009%

Japanese honorifics

The neuron activates on Japanese polite honorific prefixes (the “お” and “ご” often used to show respect).

No Comments

No Known Activations