INDEX
    Explanations

    explaining or giving examples

    requests and content related to programming/code (including code blocks and technical prompts), often spiking around conversation turn-boundary tokens.

    New Auto-Interp
    Negative Logits
     ওই
    0.44
     apparently
    0.38
    apparently
    0.36
    OffsetY
    0.35
     vendar
    0.35
     miatt
    0.33
     arginine
    0.33
    RefreshToken
    0.33
    회가
    0.32
     নাকি
    0.32
    POSITIVE LOGITS
     பொதுவாக
    0.52
     สำหรับ
    0.51
     সাধারণত
    0.49
    สำหรับ
    0.47
     Examples
    0.46
    Для
    0.45
     다양한
    0.43
     Typically
    0.43
     Descripción
    0.43
     Beispiele
    0.42
    Act Density 0.406%

    No Known Activations