INDEX

Explanations

overall or though

The neuron activates on first-person opinion or experience markers (e.g. “I,” “will,” “would,” “was,” “to do again”), i.e. personal statements of preference or intent.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 حتی

-1.32

 даже

-1.28

 thậm

-1.19

 bahkan

-1.14

 even

-1.13

лых

-1.05

 навіть

-1.02

 même

-1.00

 sogar

-0.99

どの

-0.96

POSITIVE LOGITS

but

2.25

 però

1.45

 will

1.43

 však

1.34

 overall

1.31

 Tuttavia

1.24

 ولكن

1.20

BUT

1.18

 pero

1.16

 όμως

1.16

Activations Density 0.004%