INDEX

Explanations

policy and regulations

The neuron fires on occurrences of the word “policy” (including in URL paths and titles).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 appelez

-0.85

 paciencia

-0.82

conven

-0.79

 מס

-0.76

routed

-0.75

meas

-0.73

 выгод

-0.73

 заме

-0.71

hor

-0.71

 confortável

-0.70

POSITIVE LOGITS

 reactant

0.94

_))

0.78

 stubs

0.78

GAT

0.78

 vento

0.78

ங

0.77

$)$

0.75

ӷ

0.74

‟

0.74

 spamming

0.74

Activations Density 0.035%