INDEX

Explanations

lists or asks questions

high-frequency function words and structural/formatting tokens (e.g., articles, prepositions, modals, punctuation, and control/section markers).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

あくまで

0.29

 subtlety

0.27

 intégr

0.27

 Anzahl

0.26

 playmaker

0.26

 červ

0.25

 மொத்தம்

0.25

 Mutations

0.25

wechsl

0.24

 Holds

0.24

POSITIVE LOGITS

риа

0.26

様専用

0.25

ларда

0.25

ῶν

0.24

MENTS

0.23

ेलर

0.23

ιου

0.23

اريات

0.23

다가

0.23

ικού

0.23

Activations Density 0.556%