INDEX

Explanations

safety and comfort

The neuron fires on phrases expressing gratitude for someone “taking the time,” i.e. occurrences of “take the time” (especially in “thanks for taking the time”).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Trouver

0.64

 satisfactorily

0.62

 verfügbar

0.60

 conseguir

0.56

 available

0.55

 able

0.55

 disponíveis

0.53

빈

0.53

 coprime

0.52

 succeed

0.52

POSITIVE LOGITS

 thanking

0.71

sız

0.59

吐槽

0.57

 acknowledging

0.56

 erlä

0.56

 congratulated

0.55

ওর

0.55

 explaining

0.55

 thanked

0.54

 اقول

0.53

Activations Density 0.099%