INDEX

Explanations

"the" followed by specific nouns

tokens that never activate — an effectively inactive neuron.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

\]

0.26

0.25

0.24

0.22

.]

0.21

।

0.21

^{*}

0.21

𝗔

0.20

POSITIVE LOGITS

to

0.30

 algunos

0.23

 soldats

0.22

kprop

0.22

 exemplu

0.21

of

0.21

bardziej

0.21

 dược

0.21

 актриса

0.21

at

0.21

Activations Density 0.012%