INDEX

Explanations

pronoun followed by verb phrase

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 violations

0.50

 aviation

0.48

unting

0.47

か

0.46

essä

0.45

 seepage

0.44

 प्रतिकूल

0.44

そば

0.43

ageddon

0.42

씩

0.42

POSITIVE LOGITS

for

0.50

з

0.44

 cleanly

0.42

 entspre

0.42

ેલ

0.42

 compon

0.41

ونم

0.41

 Gilroy

0.41

ﻷ

0.41

 definitively

0.41

Activations Density 0.002%