INDEX

Explanations

legal language related to defamation and workplace issues

oai_token-act-pair · gemini-2.0-flash

legal proceedings

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_4/width_16k/average_l0_88

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.4.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

슷

-0.58

NECTION

-0.54

kháu

-0.54

 Cuff

-0.50

etka

-0.50

apparent

-0.50

läs

-0.50

savings

-0.49

__":

-0.49

recogn

-0.48

POSITIVE LOGITS

 libel

0.75

 slander

0.73

Rüyada

0.67

 rumor

0.61

 defamation

0.59

<bos>

0.56

tagHelperRunner

0.55

0.54

 rumors

0.54

谣

0.54

Activations Density 1.939%

legal language related to defamation and workplace issues

legal proceedings

No Comments

No Known Activations

legal language related to defamation and workplace issues

legal proceedings

No Comments

No Known Activations