INDEX

Explanations

acts of stealing

The neuron activates on words denoting acts of theft (e.g. steal, stole, stealing).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 evoked

-0.91

 royaume

-0.88

全て

-0.86

 zcela

-0.86

 populaires

-0.85

すべて

-0.85

⸙

-0.84

ровая

-0.82

 YANG

-0.82

 learnt

-0.82

POSITIVE LOGITS

 from

2.08

 steals

1.17

 borrow

1.07

 از

1.06

 steal

1.03

 chances

1.01

ustus

0.99

 glances

0.95

thats

0.95

from

0.94

Activations Density 0.028%