INDEX

Explanations

say "instruction"

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

芽

-0.09

 Clube

-0.08

еди

-0.08

 пики

-0.08

 bayi

-0.08

 koup

-0.08

.sqrt

-0.08

 mutants

-0.08

Afee

-0.08

emers

-0.08

POSITIVE LOGITS

 instruction

0.57

_instruction

0.53

Instruction

0.52

instruction

0.52

 Instruction

0.52

 instruk

0.45

 инструкция

0.44

 инструк

0.44

 instructions

0.42

 instrucciones

0.42

Activations Density 0.044%