INDEX

Explanations

avoiding insults and attacks

The neuron responds strongly to personal address pronouns—especially “you,” “your,” and “my”—marking direct second- and first-person references.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 feasible

0.62

 embezz

0.60

feasible

0.59

 viable

0.57

 prototypes

0.56

 neglig

0.55

 prototyp

0.55

 disaster

0.55

 disasters

0.55

 catastrophe

0.54

POSITIVE LOGITS

 말투

1.00

 আক্রমণ

0.96

 troll

0.91

攻撃

0.91

 insults

0.90

 Disqus

0.87

 trolling

0.86

 trolls

0.86

 оско

0.86

 accusing

0.85

Activations Density 0.270%