INDEX

Explanations

praising or condemning actions

The neuron detects positive appraisal language—that is, verbs and words expressing praise, commendation, or laudatory sentiment.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

طبيق

-1.02

 please

-0.98

tilage

-0.95

儼

-0.95

говари

-0.92

 зимой

-0.91

Zubereitung

-0.91

breviations

-0.90

 вопрос

-0.90

 用户

-0.89

POSITIVE LOGITS

the

1.60

 efforts

1.48

how

1.28

worthy

1.23

decision

1.12

 effort

1.12

 anew

1.11

him

1.09

 şeyi

1.07

efforts

1.07

Activations Density 0.017%