INDEX

Explanations

probably followed by certain words

The neuron fires on hedging or uncertainty expressions (e.g. “probably,” qualifiers like “more,” and other words signaling doubt or tentative statements).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 parfois

-1.67

</h2>

-1.49

 semblent

-1.41

 soms

-1.37

parently

-1.32

 renon

-1.31

 sembl

-1.30

 pudiera

-1.28

طنين

-1.26

 appara

-1.25

POSITIVE LOGITS

 most

1.59

 will

1.48

 more

1.24

 somewhere

1.22

won

1.19

 reinen

1.18

 wouldn

1.17

Activations Density 0.029%