INDEX

Explanations

instructions to language model

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 rents

-0.09

_CTRL

-0.09

 Truly

-0.08

 asap

-0.08

 ****************************************

-0.08

riol

-0.08

 okum

-0.08

 réellement

-0.08

 truly

-0.08

 someday

-0.08

POSITIVE LOGITS

 seemingly

0.13

 presumably

0.11

 apparently

0.11

 seeming

0.09

 schein

0.09

 aparentemente

0.09

 seems

0.09

 offenbar

0.08

 stating

0.08

 seem

0.08

Activations Density 0.126%