INDEX

Explanations

specific concepts and outcomes

The neuron activates on single-word abstract concepts or measured quantities (e.g. “evidence,” “details,” “numbers,” “consequences”), picking out nouns that express ideas or metrics rather than concrete objects.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

selben

-2.00

-1.88

-1.77

-1.75

로

-1.67

蕺

-1.65

-1.54

𝒕

-1.54

 bemerk

-1.53

POSITIVE LOGITS

 there

2.25

has

1.78

 through

1.70

 they

1.69

 Without

1.63

犼

1.63

 says

1.62

 tells

1.62

 purposefully

1.60

 wants

1.58

Activations Density 0.121%