INDEX
    Explanations

    The neuron activates on occurrences of the word “Paradise” (i.e. the token sequence spelling out “Paradise”).

    New Auto-Interp
    Negative Logits
     elekt
    -0.07
    -0.07
    _ke
    -0.06
     вважа
    -0.06
    ugador
    -0.06
     bow
    -0.06
     ecs
    -0.06
     electrodes
    -0.06
    _Control
    -0.06
    thro
    -0.06
    POSITIVE LOGITS
     Paradise
    0.14
     paradise
    0.12
     Eden
    0.09
     oasis
    0.09
     haven
    0.09
     Oasis
    0.08
    istence
    0.08
    공지
    0.07
    abee
    0.07
    ره
    0.07
    Act Density 0.007%

    No Known Activations