INDEX
    Explanations

    the token that marks the model/assistant’s turn in a chat transcript.

    New Auto-Interp
    Negative Logits
     ESM
    0.33
     пользователя
    0.33
    ्योर
    0.31
     रव
    0.31
     большой
    0.31
     úrov
    0.31
     большим
    0.30
    0.30
    Cuánto
    0.30
    0.30
    POSITIVE LOGITS
    answer
    0.35
    Sources
    0.33
    Wikipedia
    0.33
     Sources
    0.31
     answer
    0.31
    sources
    0.31
    Answer
    0.30
    Mutual
    0.30
     Guinness
    0.29
    Sadly
    0.29
    Act Density 0.006%

    No Known Activations