INDEX

Explanations

defending, protecting, threats, providing, purchased

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 deja

0.61

。

0.58

।

0.57

،

0.56

 {}'.

0.56

 misinterpreted

0.56

 déjà

0.55

 {}".

0.55

 nenhum

0.55

$+

0.54

POSITIVE LOGITS

 peasants

0.65

для

0.64

untuk

0.63

 robbers

0.57

 для

0.57

是为了

0.57

 为了

0.55

 communal

0.55

 apprentices

0.55

预

0.55

Activations Density 0.002%