INDEX

Explanations

categorized roughly by

The neuron flags mentions of online harassment techniques—especially doxxing, swatting, and the release of personal information.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Кстати

0.95

惟

0.80

Consequently

0.77

哩

0.76

 Consequently

0.72

ilà

0.71

棗

0.69

呀

0.69

 Deshalb

0.68

🔡

0.68

POSITIVE LOGITS

 vatandaş

0.86

 reddit

0.85

 pokud

0.82

 subreddit

0.82

시고

0.80

 assh

0.80

成为了

0.79

 societ

0.79

썻

0.78

 justiça

0.77

Activations Density 0.055%