INDEX
    Explanations

    the assistant’s turn marker in chat-style transcripts (i.e., the model speaker tag)

    New Auto-Interp
    Negative Logits
     payload
    0.36
     STM
    0.36
    rq
    0.35
     Application
    0.35
     applying
    0.35
     generated
    0.35
     benchmark
    0.34
     तक्र
    0.33
    heb
    0.33
     aplicar
    0.33
    POSITIVE LOGITS
    โร
    0.33
    来说
    0.30
    ்கலை
    0.29
     사람들이
    0.29
     свадь
    0.29
     সীমান্ত
    0.29
     今天
    0.28
     разговари
    0.28
    ServletException
    0.28
    ഴിലാ
    0.28
    Act Density 0.124%

    No Known Activations