INDEX

Explanations

aren't

The neuron detects occurrences of the negation contraction “aren’t.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 alebo

-2.63

焄

-2.39

is

-2.39

倌

-2.28

 betale

-2.27

 bidra

-2.23



-2.23

-2.19

-2.13

Ἢ

-2.13

POSITIVE LOGITS

are

2.61

 Перейти

2.48

</sub>

2.42

Sebagai

2.41

 sebag

2.41

 ungew

2.28

</i>

2.25

ка

2.22

that

2.22

2.20

Activations Density 0.008%