INDEX

Explanations

2

The neuron strongly responds to the start-of-text token, i.e., the beginning of a sequence.

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Gone

-0.08

須

-0.08

 مرح

-0.08

 Wanted

-0.08

 bumps

-0.08

회

-0.08

 cornerstone

-0.08

/rem

-0.08

 births

-0.08

�

-0.07

POSITIVE LOGITS

 vergelijking

0.08

 portátil

0.08

出去

0.08

 envelop

0.07

pad

0.07

 groot

0.07

fb

0.07

Ub

0.07

fw

0.07

 grip

0.07

Activations Density 0.221%