INDEX

Explanations

certainty

The neuron detects words expressing certainty or confidence (e.g., “certain,” “sure”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

dö

-1.26

 löst

-1.17

視野

-1.11

掟

-1.08

 schlägt

-1.08

zembro

-1.05

</sup>

-1.03

Jawab

-1.02

Sebelum

-1.02

 fährt

-1.02

POSITIVE LOGITS

of

3.61

 that

2.56

 they

1.24

 about

1.17

 SURE

1.15

ši

1.12

 medži

1.07

੧

1.07

ly

1.02

ของ

1.02

Activations Density 0.020%