INDEX
    Explanations

    commands or instructions starting with "First,"

    introductory phrases or transitions in text

    New Auto-Interp
    Negative Logits
    abled
    -0.75
    driving
    -0.72
    rams
    -0.72
    oslav
    -0.71
    nes
    -0.70
    ildo
    -0.68
    bd
    -0.67
    adv
    -0.66
    aden
    -0.66
    lain
    -0.66
    POSITIVE LOGITS
     congratulations
    0.89
     let
    0.88
     introdu
    0.87
     congr
    0.78
     apologize
    0.71
     Introduction
    0.70
     lets
    0.70
     apologies
    0.68
     FIX
    0.67
     suppose
    0.66
    Act Density 0.080%

    No Known Activations