INDEX

Explanations

scams and attacks

This neuron reliably activates on references to harmful attacks or criminal acts—e.g. theft, stealing, identity‐theft, robbery, malware exploits, and related malicious activities.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

pyridazine

0.87

荷重

0.79

濋

0.78

 applicability

0.77

AutoSize

0.75

rystall

0.73

physiological

0.72

补偿

0.72

热爱

0.72

棹

0.71

POSITIVE LOGITS

 fraudulently

1.98

 fraudulent

1.95

 unscrupulous

1.94

 nefarious

1.91

 malicious

1.87

 illegally

1.84

 maliciously

1.84

 embezzlement

1.76

 fraude

1.75

 deceit

1.74

Activations Density 2.458%