INDEX

Explanations

general text

np_max-act · gemini-2.0-flash

The neuron detects directive or policy‐specification language—that is, the instruction and guideline sections of the prompt where roles, rules, or required behaviors are laid out.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

financial calculations involving interest rates.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

:**

-0.06

.gstatic

-0.06

ocks

-0.06

zos

-0.06

.lv

-0.06

อลลาร

-0.06

icus

-0.06

лів

-0.06

.mybatisplus

-0.06

.Skin

-0.06

POSITIVE LOGITS

trajectory

0.07

 efficiencies

0.06

[J

0.06

 Want

0.06

 Statistical

0.06

 './../../

0.06

ikipedia

0.06

ت

0.06

 leng

0.06

 realities

0.06

Activations Density 0.075%

general text

The neuron detects directive or policy‐specification language—that is, the instruction and guideline sections of the prompt where roles, rules, or required behaviors are laid out.

financial calculations involving interest rates.

No Comments

No Known Activations

general text

The neuron detects directive or policy‐specification language—that is, the instruction and guideline sections of the prompt where roles, rules, or required behaviors are laid out.

financial calculations involving interest rates.

No Comments

No Known Activations