INDEX

Explanations

proper ethical conduct

The neuron is tuned to detect directive or advisory language—words and verbs that convey instructions, suggestions, or calls to action.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 keuze

-1.22

 generó

-1.16

udahkan

-1.14

yet

-1.13

了出去

-1.13

却被

-1.12

 vanske

-1.09

charest

-1.08

ПРЕ

-1.08

atoga

-1.06

POSITIVE LOGITS

 what

1.38

 benar

1.30

 good

1.27

 сначала

1.16

 better

1.12

 behave

1.10

 best

1.09

 proper

1.09

 considerare

1.06

 healthy

1.04

Activations Density 0.025%