INDEX

Explanations

mimicry and camouflage

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Mark

-0.08

 okuva

-0.08

ાયત

-0.08

 सेहो

-0.08

 fòrça

-0.08

้อม

-0.07

 marking

-0.07

 mondial

-0.07

_mark

-0.07

ayam

-0.07

POSITIVE LOGITS

.safe

0.09

 locally

0.08

_SEC

0.08

 insign

0.08

Match

0.08

.ir

0.08

ithe

0.08

 harmless

0.08

izzly

0.08

.matches

0.07

Activations Density 0.016%