INDEX

Explanations

what next

The neuron fires on aggressive, threatening, or war‐like language signaling violence or extreme confrontation.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

its

-1.29

 Bereichen

-1.04

or

-1.02

ּוֹ

-1.01

and

-0.93

香水

-0.91

大半

-0.91

że

-0.90

at

-0.90

 ziemlich

-0.90

POSITIVE LOGITS

 addirittura

1.13

1.05

alingen

1.05

 immediatamente

1.02

大丈夫です

0.99

Prijs

0.98

こともある

0.97

いいですね

0.96

{[

0.96

 heretofore

0.96

Activations Density 0.082%