INDEX
    Explanations

    instructions or commands to follow

    instances of the word "follow."

    New Auto-Interp
    Negative Logits
    pite
    -0.82
    inese
    -0.78
    Newsletter
    -0.71
    ãĥĨãĤ£
    -0.71
    cci
    -0.69
    risome
    -0.68
     Scotia
    -0.68
    inished
    -0.66
     ILCS
    -0.66
    urrection
    -0.64
    POSITIVE LOGITS
     directions
    0.86
     closely
    0.80
    follow
    0.75
     suit
    0.75
    ansen
    0.74
    itored
    0.73
     blindly
    0.71
     Follow
    0.71
     behav
    0.69
     obedient
    0.68
    Act Density 0.036%

    No Known Activations