INDEX

Explanations

i don't think

This neuron detects first‐person opinion or hedging phrases, especially “I think” (and its variants like “I don’t think”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 bereits

-1.44

鎶

-1.41

縝

-1.39

牴

-1.38

 становника

-1.36

expédition

-1.36

谘

-1.36

 adverten

-1.35

 trochu

-1.35

暱

-1.34

POSITIVE LOGITS

it

2.03

 they

1.80

you

1.49

1.47

//!

1.34

—

1.34

 anyone

1.31

anc

1.31

1.26

Activations Density 0.010%