INDEX

Explanations

providing reasons

The neuron detects first‐person declarations or personal preference statements (sentences starting with “I” expressing desires, likes, or intentions).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

madeus

-1.05

Jereo

-0.94

❛

-0.93

struzioni

-0.92

സ്

-0.91

 добавил

-0.89

ktır

-0.88

 добра

-0.85

Unisex

-0.85

 Tampoco

-0.85

POSITIVE LOGITS

 because

1.73

 karena

1.25

 ponieważ

1.18

 porque

1.12

vì

1.10

 protože

1.00

 lembrar

0.98

porque

0.96

 mostly

0.95

 çünkü

0.94

Activations Density 0.038%