INDEX
    Explanations

    specific phrases or keywords related to instructions or prompts

    sentence starters or common phrases indicative of dialogue and narrative structure

    New Auto-Interp
    Negative Logits
     }}
    -0.49
     [|
    -0.47
     referen
    -0.46
    idated
    -0.46
    ebook
    -0.45
    arsity
    -0.44
     CLSID
    -0.44
     Moroc
    -0.44
    代
    -0.44
     Polo
    -0.44
    POSITIVE LOGITS
    etheless
    0.69
    ktop
    0.64
    mosp
    0.64
    resa
    0.62
    swers
    0.62
    xiety
    0.59
    zbollah
    0.58
    Voice
    0.56
    jriwal
    0.54
    notations
    0.54
    Act Density 0.589%

    No Known Activations