INDEX
    Explanations

    phrases indicating goals or intended outcomes

    New Auto-Interp
    Negative Logits
     Schroeder
    -0.70
     The
    -0.65
    ogeneous
    -0.61
     '
    -0.59
     "
    -0.59
    footnote
    -0.58
    -0.58
    <eos>
    -0.58
     C
    -0.57
     den
    -0.56
    POSITIVE LOGITS
     aim
    3.11
     Aim
    2.96
    Aim
    2.89
    aim
    2.76
     Aims
    2.61
     aims
    2.58
     AIM
    2.28
    Aims
    2.28
     aiming
    2.25
     aimed
    2.18
    Act Density 0.059%

    No Known Activations