INDEX

Explanations

negative descriptions or outcomes

This neuron activates strongly on long, multisyllabic content words (e.g. “symptoms,” “reluctance,” “emergency,” “protocol,” “instantly,” etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

0.99

0.96

0.78

 इत्यादि

0.78

 Итак

0.78

 Something

0.72

 Thus

0.71

 Announces

0.71

 इत्यादी

0.71

 This

0.69

POSITIVE LOGITS

跟他

0.69

 unwitting

0.67

гант

0.67

 cocaine

0.66

拿着

0.66

 folosit

0.66

0.64

뺨

0.64

 कथित

0.63

を使う

0.63

Activations Density 0.005%