INDEX

Explanations

interests and dislikes

The neuron activates on nouns referring to personal interests, hobbies, or things someone loves or is passionate about.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

for

-1.42

on

-1.36

 there

-1.27

in

-1.24

 each

-1.18

 without

-1.05

 with

-1.03

 while

-1.03

 upon

-1.02

 under

-1.01

POSITIVE LOGITS

and

1.30

teilte

1.19

agreed

1.16

announced

1.12

 disques

1.11

 étan

1.09

喜欢

1.07

geren

1.04

hates

1.01

 любую

1.00

Activations Density 0.038%