INDEX

Explanations

prejudice and preposterous

The main thing this neuron does is detect the word “prejudice” (and closely related bias‐ or discrimination‐themed terms).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ne

-2.56

࿚

-2.55

ৢ

-2.52

 включая

-2.50

僂

-2.47

-2.45

 souligné

-2.44

我才

-2.36

ၞ

-2.34

家主

-2.33

POSITIVE LOGITS

3.42

 駅前

2.83

也

2.61

 белые

2.48

 oock

2.47

 laget

2.38

 飯店

2.33

 daer

2.33

 纹身

2.31

Activations Density 0.002%