INDEX

Explanations

stop words

np_max-act · gemini-2.0-flash

This neuron detects mentions of similarity (e.g. “similarity,” “similar,” “similarities”) in the context of recommendation algorithms.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Guantanamo

-0.07

cnt

-0.06

lu

-0.06

 ultrasound

-0.06

jte

-0.06

Ala

-0.06

emo

-0.06

.MixedReality

-0.06

grown

-0.06

POSITIVE LOGITS

ruh

0.07

 đẹp

0.06

가능

0.06

_posts

0.06

_define

0.06

enger

0.06

Gum

0.06

�

0.06

'),↵

0.06

 _↵↵

0.06

Activations Density 0.024%

stop words

This neuron detects mentions of similarity (e.g. “similarity,” “similar,” “similarities”) in the context of recommendation algorithms.

No Comments

No Known Activations