INDEX

Explanations

The neuron seems to be focusing on specific grammatical structures or punctuation following certain words.- `that` followed by `her` or `'` or `that`.- `.` followed by `Haven` or `of` or `of`.- `of` followed by `of` or `h` or `.`.The `TOP_POSITIVE_LOGITS` are a mix of characters and short text fragments, not immediately forming a clear semantic pattern.The `TOP_ACTIVATING_TEXTS` show examples of:- Explaining possibilities ("one specific way he could have learned it. However, he could have")- Describing emotional states or coping mechanisms ("struggle to regulate their emotions effectively. They might feel overwhelmed by sadness,", "unfulfilling or overwhelming. * **Simply Haven't Found the Right Person:")- Explaining a scenario ("She might be misinterpreting the rejection. Perhaps she thought there was a stronger connection than there was.")- Conversational phrases ("You know, I'm surprised you haven't gotten a medal! A little something for showing up dressed.")- Explaining interpretation ("We have to reinterpret "Mafia" to fit this structure. Possible Interpretations of")- Explaining consequence ("even if it had unintended negative consequences. You're not accepting blame because your motivation was positive. 4. Disclaiming Responsibility")Looking at `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`:`that` -> `her` (

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 tributaries

1.20

 Criticism

1.20

 tannins

1.17

 Conclusions

1.16

1.15

 organs

1.13

 softener

1.13

 grievances

1.13

 wrongs

1.13

 ascribe

1.13

POSITIVE LOGITS

2.22

2.16

1.97

ر

1.92

om

1.80

1.79

it

1.77

ről

1.77

ار

1.76

an

1.73

Activations Density 0.001%