INDEX

Explanations

Comparing to human behavior

np_max-act · gemini-2.0-flash

interactions and relationships between the user and AI characters.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

This neuron fires on tokens in the common comparison phrase “like any other human would,” i.e. words used in the construction “like any other human would.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Tiles

-0.07

 Öğ

-0.06

 chip

-0.06

mel

-0.06

 intend

-0.06

AZ

-0.06

新

-0.06

 Knowing

-0.06

<!

-0.06

terrain

-0.06

POSITIVE LOGITS

SCALL

0.08

0.06

 Archbishop

0.06

.jdesktop

0.06

_TA

0.06

菲

0.06

 wirklich

0.06

 peacefully

0.06

 watchdog

0.06

нциклопед

0.06

Activations Density 0.006%

Comparing to human behavior

interactions and relationships between the user and AI characters.

This neuron fires on tokens in the common comparison phrase “like any other human would,” i.e. words used in the construction “like any other human would.”

No Comments

No Known Activations

Comparing to human behavior

interactions and relationships between the user and AI characters.

This neuron fires on tokens in the common comparison phrase “like any other human would,” i.e. words used in the construction “like any other human would.”

No Comments

No Known Activations