INDEX

Explanations

authorship attribution

The neuron flags words and forms related to authorship (e.g. author, authored, authoring).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 orice

-2.25

鲞

-2.17

-2.16

琑

-2.05

￬

-2.03

泚

-1.98

joyed

-1.98

🗿

-1.95

蚡

-1.88

”

-1.86

POSITIVE LOGITS

↵

2.39

ᨆ

2.14

現貨

2.06

Ꮉ

2.02

excellents

2.00

夘

1.99

 手作り

1.98

ဴ

1.98

المشار

1.97

Activations Density 0.009%