INDEX

Explanations

problematic or excessive situations

The neuron responds to words that denote threat or danger—especially terms like “menace,” “menacing,” and “rampant.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ные

-2.84

酲

-2.80

玧

-2.70

-2.66

”;

-2.64

-2.58

”(

-2.52

”?

-2.52

姈

-2.52

”:

-2.50

POSITIVE LOGITS

The

2.33

</b>

2.25

VIATIONS

2.22

佧

2.20

 😂

1.98

 wirklich

1.97

 élevés

1.93

 freaking

1.91

 sekarang

1.90

 sommeil

1.89

Activations Density 0.001%