INDEX

Explanations

I'm not sure

The neuron detects hedging or uncertainty expressions, most strongly “not sure” (and variants like “not sure if/how/that…”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

to

-2.02

-1.91

～～～～

-1.90

 notwendig

-1.88

 結果

-1.83

Those

-1.77

 вернулся

-1.77

 فهي

-1.77

 communs

-1.75

 temporaire

-1.72

POSITIVE LOGITS

 about

1.88

1.86

 apprehension

1.85

՚

1.76

 whether

1.71

»,

1.70

違って

1.67

1.66

1.64

 começaram

1.63

Activations Density 0.012%