INDEX

Explanations

any or too

This neuron detects words and phrases that express prohibition or preventing something (e.g. ensuring none talk, don’t like the public peeking).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 plagio

-0.98

ume

-0.97

 eser

-0.94

 prejudices

-0.91

 Martí

-0.90

ziej

-0.89

 totalitarian

-0.86

 jaro

-0.85

arshal

-0.85

しゃぶ

-0.85

POSITIVE LOGITS

too

1.41

any

1.36

 слишком

1.26

有任何

1.22

 demasi

1.02

 anything

1.00

 groaned

0.99

 demasiado

0.92

任何

0.90

 qualquer

0.89

Activations Density 0.092%