INDEX
    Explanations

    phrases that mark assistant turn-taking and conversational preambles, signaling the model’s persona and setup for a response in chat-style dialogue.

    New Auto-Interp
    Negative Logits
     nutritive
    0.27
     fertilizers
    0.27
     pollinators
    0.27
     abiotic
    0.26
     ATMs
    0.26
     toxins
    0.26
     pointers
    0.26
     antioxidants
    0.26
     alleles
    0.26
     legis
    0.26
    POSITIVE LOGITS
     jsem
    0.44
    我现在
    0.41
    0.40
    please
    0.37
    আমি
    0.36
    我已经
    0.35
    Please
    0.35
    मैंने
    0.35
    Hello
    0.35
     मैंने
    0.35
    Act Density 1.756%

    No Known Activations