INDEX

Explanations

It finds words relating to a person's willingness to take action, or words that express the speaker's opinion.

oai_token-act-pair · gemini-2.0-flash

do

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_25/width_16k/average_l0_41

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.25.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 nearly

-3.05

nearly

-2.92

 Nearly

-2.91

Almost

-2.83

 Almost

-2.81

Nearly

-2.81

 almost

-2.77

almost

-2.72

 virtually

-2.31

 quase

-2.25

POSITIVE LOGITS

featureID

0.85

AndEndTag

0.73

 Normdatei

0.66

MLLoader

0.64

XMLSchema

0.61

談社

0.59

 يتيمه

0.58

WaitGroup

0.58

новништво

0.58

mybatisplus

0.57

Activations Density 5.902%

It finds words relating to a person's willingness to take action, or words that express the speaker's opinion.

do

No Comments

No Known Activations