INDEX

Explanations

explaining reasons or beliefs

The neuron spots sentences that open with or heavily feature first‐person pronouns (I, we) indicating a personal statement or opinion.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 even

-2.20

 навіть

-1.84

 даже

-1.78

 however

-1.70

 incluso

-1.69

 should

-1.66

 nawet

-1.59

 sogar

-1.53

 bahkan

-1.53

Should

-1.42

POSITIVE LOGITS

实在是

1.48

找不到

1.31

 justement

1.27

實在

1.23

 such

1.21

 przecież

1.12

 zoveel

1.11

实在

1.09

so

1.06

 believes

1.06

Activations Density 0.061%