INDEX
    Explanations

    "the" followed by specific nouns

    tokens that never activate — an effectively inactive neuron.

    New Auto-Interp
    Negative Logits
    \]
    0.26
    T
    0.25
    ;
    0.24
    2
    0.22
    .]
    0.21
    0.21
    I
    0.21
    ^{*}
    0.21
    𝗔
    0.20
    1
    0.20
    POSITIVE LOGITS
     to
    0.30
     algunos
    0.23
     soldats
    0.22
    kprop
    0.22
     exemplu
    0.21
     of
    0.21
    bardziej
    0.21
     dược
    0.21
     актриса
    0.21
     at
    0.21
    Act Density 0.012%

    No Known Activations