INDEX

Explanations

AI safety guidelines

Tokens that occur in the model/assistant's generated responses — i.e., parts of the assistant's voice, especially safety, explanatory, or conversational reply text.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

emorrh

0.47

ന്ത്രാ

0.43

 पारी

0.43

padStart

0.43

ictionaries

0.42

ServerError

0.41

 Conversely

0.40

 palmit

0.39

ndrome

0.38

 sucesivamente

0.38

POSITIVE LOGITS

غير

0.47

غیر

0.45

釈

0.45

 Dairy

0.44

 بود

0.43

 Tessa

0.42

 Sugar

0.42

 ویس

0.41

ですが

0.40

ﻁ

0.40

Activations Density 14.515%