INDEX

Explanations

Appearances may be deceptive

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Clamp

-0.08

 exclusiva

-0.07

 compartilh

-0.07

 opio

-0.07

 klimaat

-0.07

.Clamp

-0.07

 exclusivo

-0.07

Clamp

-0.07

 clamp

-0.07

 lähe

-0.07

POSITIVE LOGITS

 deceptive

0.19

 decept

0.16

 superfic

0.14

 dece

0.14

 aparent

0.14

 enga

0.13

隐藏

0.13

 disguised

0.13

 disguis

0.13

 deception

0.13

Activations Density 0.078%