INDEX

Explanations

abstain, withhold, desist

The neuron activates on words and phrases that express refraining from or avoiding an action (e.g., “refrain from,” “abstain,” “stay out of”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 flawless

-1.84

渽

-1.78

 beneficial

-1.63

 disheart

-1.59

つて

-1.54

甞

-1.53

过年

-1.52

いざ

-1.52

殲

-1.51

 espé

-1.50

POSITIVE LOGITS

Ꞓ

1.60

 nivå

1.59

 バッジ

1.55

激烈

1.55

ens

1.52

ів

1.52

ek

1.50

⑊

1.48

ts

1.47

ited

1.46

Activations Density 0.008%