INDEX

Explanations

could say

The neuron fires on hedge or stance‐marking verbs and phrases (e.g. “could,” “argue,” “say,” “think”) that signal speculation or opinion.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-1.73

↵↵↵↵↵↵↵↵↵↵↵↵

-1.63

-1.51

↵↵↵↵↵↵↵↵↵

-1.50

↵↵↵↵↵↵

-1.48

for

-1.45

↵↵↵↵↵↵↵

-1.42

↵↵↵

-1.42

-1.41

↵↵↵↵↵↵↵↵↵↵↵

-1.38

POSITIVE LOGITS

 also

1.66

 retrô

1.58

 prenez

1.55

 óculos

1.50

 protège

1.45

 klap

1.39

 akumulator

1.38

 veneno

1.37

 garantit

1.34

 rús

1.34

Activations Density 0.015%