INDEX

Explanations

concluding or contrasting statements

tokens that appear in the model's long, formal explanatory or instructional responses (i.e., assistant-generated detailed exposition).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 이건

0.21

 jeans

0.20

 họ

0.20

это

0.20

 이거

0.20

 swirls

0.19

 headlights

0.19

Didn

0.19

 bunnies

0.19

 combos

0.19

POSITIVE LOGITS

 Therefore

0.52

Therefore

0.49

 Consequently

0.48

 Furthermore

0.48

 Accordingly

0.43

 Поэтому

0.42

 therefore

0.42

 Moreover

0.41

Consequently

0.41

因此

0.40

Activations Density 0.086%