INDEX
    Explanations

    phrases that include confirmations, warnings, or prompts indicating user actions or limits

    New Auto-Interp
    Negative Logits
    ↵↵
    -0.75
    <eos>
    -0.71
     –
    -0.61
    -0.52
    -0.49
    WHEN
    -0.47
    ↵↵↵↵
    -0.46
    EnableWeb
    -0.43
     Ibid
    -0.43
    ↵↵↵↵↵
    -0.42
    POSITIVE LOGITS
    \'
    1.25
    !");
    1.24
    !\
    1.19
    .");
    1.14
    :");
    1.13
    !');
    1.13
    ...");
    1.09
    !";
    1.09
    !')
    1.07
    .');
    1.06
    Act Density 0.446%

    No Known Activations