INDEX
    Explanations

    assistant-style boilerplate that introduces options or bullet-pointed rewrites/explanations in instructional responses.

    New Auto-Interp
    Negative Logits
     تفصیل
    0.43
    перь
    0.41
    itement
    0.40
     açıklam
    0.39
     panoram
    0.39
     panor
    0.37
     raccont
    0.37
    explique
    0.37
     terakhir
    0.36
     자랑
    0.36
    POSITIVE LOGITS
     here
    0.86
    Here
    0.83
     Here
    0.78
    here
    0.76
     aquí
    0.64
     यहां
    0.62
     aqui
    0.62
     هنا
    0.61
     Aquí
    0.61
    heres
    0.61
    Act Density 0.040%

    No Known Activations