INDEX

Explanations

could be

The neuron fires on words that express possibility or uncertainty—that is, modal auxiliaries and hedge terms like may, might/could, possible, etc.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 なかっ

1.20

қ

1.20

Ｎ

1.15

 poter

1.09

Ｏ

1.07

 keď

1.05

ी

1.02

𝗴

1.00

𝗦

1.00

𝗡

1.00

POSITIVE LOGITS

س

1.52

 conceivably

1.41

ารย์

1.30

 Possibly

1.16

 possibly

1.12

ა

1.09

possibly

1.07

tom

1.05

ن

1.05

گزاری

1.03

Activations Density 0.561%