INDEX
    Explanations

    instructions or options for taking action

    phrases indicating options or alternatives

    New Auto-Interp
    Negative Logits
    ocracy
    -0.79
    Therefore
    -0.75
     correctness
    -0.73
    hed
    -0.69
    eness
    -0.69
    Merit
    -0.67
    Thus
    -0.59
    emen
    -0.58
    ilty
    -0.57
     matters
    -0.57
    POSITIVE LOGITS
     alternatively
    1.35
    chard
    1.09
    lando
    1.08
    acles
    1.05
    Else
    1.04
    acle
    1.01
     browse
    0.99
     else
    0.99
    chid
    0.92
    GAN
    0.92
    Act Density 0.095%

    No Known Activations