INDEX

Explanations

statements

np_max-act · gemini-2.0-flash

This neuron responds to the core definition sentence that says a summary is factually consistent if “all statements in the summary are entailed by the document.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

وره

-0.08

okers

-0.07

ousedown

-0.06

Creators

-0.06

Danger

-0.06

角

-0.06

ùy

-0.06

اءات

-0.06

讯

-0.06

Matches

-0.06

POSITIVE LOGITS

.Txt

0.07

 contempt

0.07

 depicting

0.06

="-

0.06

vat

0.06

 pledged

0.06

 komen

0.06

 altern

0.06

_LOWER

0.06

.api

0.06

Activations Density 0.001%

statements

This neuron responds to the core definition sentence that says a summary is factually consistent if “all statements in the summary are entailed by the document.”

No Comments

No Known Activations