INDEX

Explanations

believe, trust, listen

The neuron flags negation—especially the word “not” and its immediate context indicating a negative statement.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

క

-0.75

 ਦਾ

-0.71

logr

-0.70

zburg

-0.68

degrad

-0.68

던

-0.68

雪花

-0.67

yeon

-0.66

 pojem

-0.66

 thiệu

-0.66

POSITIVE LOGITS

 listen

5.34

 listened

4.81

 listening

4.63

 listens

4.53

listen

4.44

 Listen

4.38

Listen

4.31

 Listening

3.59

listening

3.56

Listening

3.41

Activations Density 0.044%