INDEX

Explanations

acting against or targeting

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

was

-2.81

”

-2.53

-2.44

</h2>

-2.44

is

-2.14

遑

-2.08

-2.06

-2.03

POSITIVE LOGITS

 dezelve

2.83

CHREIB

2.78

 estadounid

2.53

ﮯ

2.47

Uwaga

2.45

 zoude

2.44

zestaw

2.42



2.42

Warto

2.42

 hunne

2.41

Activations Density 0.021%