INDEX

Explanations

tokens from the model/assistant's reply—especially self-referential or help/clarification phrases (the assistant speaking).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 iteratively

0.55

 workable

0.52

🛠

0.51

 применять

0.50

 metodologia

0.50

深入

0.50

 ሂደ

0.50

ড়ান্ত

0.49

 CFRP

0.48

 дета

0.48

POSITIVE LOGITS

 😊

0.77

 smiley

0.68

😊

0.66

☺️

0.66

:)

0.64

 なさい

0.63

or

0.63

 kawaii

0.63

赤ちゃん

0.62

 어린이

0.61

Activations Density 0.029%