INDEX

Explanations

surprisingly complex or essentially describing

The neuron is detecting evaluative/discourse-marker words that signal the author’s attitude or stance (e.g. “unfortunately,” “fortunately,” “probably,” “actually,” etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

cómo

0.73

Kako

0.71

ҷ

0.70

Jeśli

0.64

 possibilità

0.63

 compréhension

0.62

্রিয়া

0.61

Cómo

0.61

éclair

0.61

 chrét

0.60

POSITIVE LOGITS

0.88

 زیادی

0.78

not

0.75

(?)

0.72

 residing

0.71

 also

0.70

也有

0.70

 coincides

0.70

有不少

0.70

 with

0.70

Activations Density 0.406%