INDEX

Explanations

code files, gitignore

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

客服

-0.09

 Diana

-0.08

文字

-0.07

 прет

-0.07

urde

-0.07

 preconce

-0.07

文学

-0.07

 promo

-0.07

POSITIVE LOGITS

 confidential

0.11

excluded

0.11

 excluded

0.10

 geschützt

0.10

 niemand

0.09

/private

0.09

Private

0.09

SECRET

0.09

 दुर्�

0.09

Secrets

0.09

Activations Density 0.003%