INDEX

Explanations

sentences with multiple instances of particular words; it highlights different kinds of specialized and academic language that contain repeated short words such as "of", "the", "in", and "to", often in conjunction with longer, content-rich words from a variety of domains.

oai_token-act-pair · gemini-2.0-flash

negative traits

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_25/width_16k/average_l0_41

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.25.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

'},

-0.84

.)}

-0.78

")));

-0.71

"},

-0.71

*/}

-0.69

"];

-0.68

')));

-0.68

}*/

-0.67

)]

-0.66

'],

-0.66

POSITIVE LOGITS

 defamation

0.76

 claust

0.74

 Aggression

0.73

 incest

0.72

 irony

0.69

 impartiality

0.69

 bragging

0.69

 Enforcement

0.68

 arrogance

0.68

 hypocrisy

0.68

Activations Density 19.832%

sentences with multiple instances of particular words; it highlights different kinds of specialized and academic language that contain repeated short words such as "of", "the", "in", and "to", often in conjunction with longer, content-rich words from a variety of domains.

negative traits

No Comments

No Known Activations

sentences with multiple instances of particular words; it highlights different kinds of specialized and academic language that contain repeated short words such as "of", "the", "in", and "to", often in conjunction with longer, content-rich words from a variety of domains.

negative traits

No Comments

No Known Activations