INDEX

Explanations

explaining the situation

The neuron fires on first‐person self‐references and stance markers (e.g. “I,” “my,” “think,” “know,” “will,” “can,” “have”), i.e. phrases expressing the author’s personal viewpoint or experience.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Then

-1.50

but

-1.47

)，

-1.38

 also

-1.37

’-

-1.34

에도

-1.31

？

-1.30

萄

-1.30

 …)

-1.30

)–

-1.30

POSITIVE LOGITS

垍

1.52

ザイン

1.33

幸いです

1.29

ítu

1.27

 くん

1.25

抱歉

1.24

萑

1.23

Celebrate

1.23

 remarqué

1.23

嬉しいです

1.20

Activations Density 0.039%