INDEX

Explanations

warning about potential danger

The neuron fires on verbs and modals (and their accompanying conjunctions) that lay out actions, requests, or warnings in procedural or “what-will/happens” statements.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 suspen

-0.96

 kazak

-0.93

 marques

-0.92

 incendi

-0.91

thérapie

-0.91

粼

-0.90

même

-0.90

aneamente

-0.88

 menimbulkan

-0.88

 تقویت

-0.87

POSITIVE LOGITS

 alert

1.39

 aware

1.34

 awareness

1.15

 potential

1.08

 alerts

1.06

 timely

1.06

 potentially

1.01

 alerted

0.99

 warn

0.98

提醒

0.96

Activations Density 0.075%