INDEX

Explanations

good and evil

The neuron specifically detects occurrences of the word “good.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 kupa

-1.55

小伙伴

-1.39

 Ukraina

-1.36

 ſame

-1.36

 makam

-1.35

 begon

-1.34

 konsult

-1.34

 Diſ

-1.33

 onderdelen

-1.32

ть

-1.31

POSITIVE LOGITS

 спеціаль

1.54

has

1.48

 Veränderung

1.47

 яку

1.44

she

1.42

زوج

1.41

 rzeczy

1.40

或者

1.39

 merece

1.36

Those

1.33

Activations Density 0.015%