INDEX

Explanations

argues, suggests, states, attributes

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 normalerweise

0.81

이고

0.69

 adalah

0.65

 meets

0.65

ж

0.64

 dikenal

0.64

 berdiri

0.63

 தேர்ந்தெடுக்க

0.62

狽

0.62

 biasa

0.62

POSITIVE LOGITS

他说

1.68

 argues

1.60

他說

1.56

 According

1.53

According

1.48

 argued

1.46

उन्होंने

1.44

 તેમણે

1.42

emphas

1.42

 citing

1.41

Activations Density 0.059%