INDEX

Explanations

The lists provided (`MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`) suggest a pattern where the primary token is often followed by a digit. However, the `TOP_POSITIVE_LOGITS` and `TOP_ACTIVATING_TEXTS` seem to point towards specific programming keywords, technical terms, or single-letter identifiers followed by a number or another identifier.Let's re-evaluate based on the common elements.- `MAX_ACTIVATING_TOKENS`: `/`, `Y`, `C`, `Kotlin`, `Laser`, `K`, `GPU`, `GR`, `No`, `Blue`- `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`: `2`, `Y`, `C`, `Kotlin`, `Laser`, `K`, `GPU`, `GR`, `No`, `Blue`This is very peculiar. It looks like the `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` lists are identical or very similar, which defeats the purpose of looking at "tokens after".2-letter abbreviations and capitalized words

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

blooded

2.30

ars

2.03

我

1.98

닭

1.87

Ook

1.83

zelfde

1.82

meye

1.81

 pesky

1.81

 maanden

1.80

ara

1.80

POSITIVE LOGITS

]//

1.86

](\

1.85

И

1.70

 ограни

1.70

)$}

1.70

)$

1.66

 использу

1.65

></

1.63

\}=\

1.59

1.58

Activations Density 0.233%