INDEX

Explanations

hedging language containing words like 'some' or 'certain'

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

PreferredItem

-1.04

UnsafeEnabled

-0.94

 насељу

-0.91

 تضيفلها

-0.91

 ſind

-0.90

ніципалі

-0.90

Efq

-0.88

 Monfieur

-0.88

 betweenstory

-0.88

ThroughAttribute

-0.87

POSITIVE LOGITS

0.48

0.47

isto

0.46

St

0.45

<strong>

0.45

0.44

</th>

0.43

<b>

0.42

خد

0.42

Activations Density 0.044%