INDEX

Explanations

Dishonest weight problems

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 varsa

-0.07

 hobbies

-0.07

 erwähnt

-0.07

ingers

-0.07

 resonate

-0.07

.Setter

-0.07

>,↵

-0.07

 взаимодейств

-0.07

 avoid

-0.07

 isempty

-0.07

POSITIVE LOGITS

 fooled

0.11

 looph

0.11

 fraudulent

0.10

 quantità

0.10

 counterfeit

0.10

 deceptive

0.10

 dishonest

0.10

 deceit

0.10

 misleading

0.10

claimed

0.09

Activations Density 0.022%