INDEX
    Explanations

    descriptive, companion, loads, vast, Prints

    It activates on tokens from the assistant/model's long, contentful instructional or explanatory responses (i.e., tokens in model-generated explanatory text).

    New Auto-Interp
    Negative Logits
     আলোচ
    0.46
     Palash
    0.45
    кои
    0.44
    ToProps
    0.43
    )».
    0.43
    assapi
    0.43
     बालिका
    0.42
    什麼
    0.42
    這邊
    0.42
     ബാല
    0.42
    POSITIVE LOGITS
    '
    0.46
       
    0.40
    <body>
    0.39
    g
    0.39
     குழு
    0.39
    0.38
     dit
    0.37
     passos
    0.37
     Group
    0.37
                                   
    0.36
    Act Density 0.701%

    No Known Activations