INDEX

Explanations

Type followed by roman numerals

This neuron detects classification labels—words like “Grade,” “Type,” or “Category” when used to name a numbered or named class.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

for

-2.23

It

-1.73

 sebagainya

-1.54

they

-1.53

👎

-1.52

He

-1.52

谕

-1.52

there

-1.51

☹

-1.50

-1.49

POSITIVE LOGITS

Här

1.93

leute

1.84

땃

1.69

Descriere

1.66

правления

1.66

sätzlich

1.64

Stap

1.55

CCIÓN

1.55

 muerta

1.51

Assemblée

1.48

Activations Density 0.123%