INDEX

Explanations

ratings and classifications

The neuron responds to mentions of “Rated” (or “rated”) followed by numeric scores—that is, instances where something is given a rating.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

).

-2.75

藹

-2.73

 duża

-2.41

 After

-2.34

煦

-2.28

崐

-2.27

ญี่ป

-2.25

戬

-2.20

酩

-2.17

一切都

-2.14

POSITIVE LOGITS

’

3.13

2.59

但

2.45

2.41

 ángeles

2.23

ی

2.22

2.20

很好的

2.19

?"

2.17

一旁

2.17

Activations Density 0.017%