INDEX
    Explanations

    conversational cues, questions, and claims that reflect uncertainty or seek clarification

    New Auto-Interp
    Negative Logits
     Мексичка
    -0.80
     snippetHide
    -0.79
    UserScript
    -0.77
    DoubleQuotes
    -0.74
    TagMode
    -0.72
    Rüyada
    -0.70
    :");
    
    -0.70
    -0.69
    InputTagHelper
    -0.69
    !")
    
    -0.68
    POSITIVE LOGITS
    ↵↵
    0.68
    <eos>
    0.67
    0.64
     Good
    0.61
     I
    0.55
    0.52
     Looks
    0.52
    Good
    0.52
    Super
    0.51
    edit
    0.50
    Act Density 0.271%

    No Known Activations