INDEX

Explanations

not followed by specific words

The neuron is strongly activated by negation and negative–polarity words (e.g. “not,” “wouldn’t,” “little,” “expecting”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 lente

-0.91

 solide

-0.87

 intrig

-0.86

 absolu

-0.84

 bologna

-0.82

izations

-0.81

 maioria

-0.78

тын

-0.78

bep

-0.78

 comport

-0.77

POSITIVE LOGITS

 envisaged

0.95

any

0.88

aktif

0.87

͈

0.85

 packet

0.85

🏟

0.84

ENTON

0.84

 Ти

0.83

 Masyarakat

0.83

 endless

0.82

Activations Density 0.055%