INDEX

Explanations

sabotage, stockpile, boycott, troubleshoot, bomb, vent, spam

the neuron detects words naming hostile or obstructive actions (e.g. sabotage, boycott, ambushed, bombing).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 think

-1.55

</h3>

-1.52

 メニュー

-1.52

re

-1.52

 była

-1.48

 затем

-1.44

 Therefore

-1.38

會有

-1.38

 Frequently

-1.36

用戶

-1.36

POSITIVE LOGITS

”？

1.72

lgari

1.63

1.62

 retten

1.59

 kritis

1.58

</b>

1.58

沨

1.58

 habis

1.54

abaikan

1.52

INGTON

1.51

Activations Density 0.038%